4 By: Junkbuster Developers
6 $Id: user-manual.sgml,v 1.28 2002/02/24 14:34:24 jongfoster Exp $
8 The user manual gives the users information on how to install and
9 configure Internet Junkbuster. Internet Junkbuster is an application
10 that provides privacy and security to users of the World Wide Web.
12 You can find the latest version of the user manual at
13 [1]http://ijbswa.sourceforge.net/user-manual/.
15 Feel free to send a note to the developers at
16 <[2]ijbswa-developers@lists.sourceforge.net>.
17 _________________________________________________________________
33 3. [12]Junkbuster Configuration
35 3.1. [13]The Main Configuration File
36 3.2. [14]The Actions File
37 3.3. [15]The Filter File
40 4. [17]Quickstart to Using Junkbuster
41 5. [18]Contact the Developers
42 6. [19]Copyright and History
50 8.1. [24]Regular Expressions
54 Internet Junkbuster is a web proxy with advanced filtering
55 capabilities for protecting privacy, filtering web page content,
56 managing cookies, controlling access, and removing ads, banners,
57 pop-ups and other obnoxious Internet Junk. Junkbuster has a very
58 flexible configuration and can be customized to suit individual needs
59 and tastes. Internet Junkbuster has application for both stand-alone
60 systems and multi-user networks.
62 This documentation is included with the current development version of
63 Internet Junkbuster and is incomplete at this point. The most up to
64 date reference for the time being is still the comments in the source
65 files and in the individual configuration files. Development of
66 version 3.0 is currently underway, and includes many significant
67 changes and enhancements over earlier verions. The target release date
68 for stable v3.0 is December 2001.
70 Since this is a development version, some features are in the process
71 of being implemented. This documentation may be slightly out of sync
72 as a result. And there are bugs, though hopefully not many!
73 _________________________________________________________________
77 In addition to Junkbuster's traditional features of ad and banner
78 blocking and cookie management, this is a list of new features
79 currently under development:
81 * A browser based configuration utility (WIP at [25]http://i.j.b).
82 * Modularized configuration that will allow for system wide
83 settings, and individual user settings. (not implemented yet,
84 probably a 3.1 feature)
85 * Blocking of annoying pop-up browser windows (previously available
87 * Support for HTTP/1.1 (partially implemented at this point).
88 * Support for Perl Compatible Regular Expressions in the
89 configuration files, and generally a more sophisticated
90 configuration syntax over previous versions.
91 * Web page content filtering.
93 * Auto-detection of config file changes.
95 In addition, the configuration is much more versatile overall.
96 _________________________________________________________________
100 Junkbuster is available as raw source code, or pre-compiled binaries.
101 See the [26]Junkbuster Home Page for current release info. Junkbuster
102 is also available via [27]CVS. This is the recommended approach at
103 this time. But please be aware that CVS is constantly changing, and it
104 may break in mysterious ways.
105 _________________________________________________________________
109 For gzipped tar archives, unpack the source:
111 tar xzvf ijb_source_* [.tgz or .tar.gz]
112 cd ijb_source_2.9.10_beta
114 For retrieving the current CVS sources, you'll need the CVS package
115 installed first. To download CVS source:
117 cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login
118 cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co cu
122 This will create a directory named current/, which will contain the
125 Then, in either case, to build from tarball/CVS source:
127 ./configure (--help to see options)
128 make (the make from gnu, gmake for *BSD)
130 make -n install (to see where all the files will go)
131 make install (to really install)
133 For Redhat and SuSE Linux RPM packages, see below.
134 _________________________________________________________________
138 To build Redhat RPM packages, install source as above. Then:
140 autoheader [suggested for CVS source]
141 autoconf [suggested for CVS source]
145 This will create both binary and src RPMs in the usual places.
148 /usr/src/redhat/RPMS/i686/junkbuster-2.9.10-1.i686.rpm
150 /usr/src/redhat/SRPMS/junkbuster-2.9.10-1.src.rpm
152 To install, of course:
154 rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.10-1.i686.rpm
156 This will place the Junkbuster configuration files in
157 /etc/junkbuster/, and log files in /var/log/junkbuster/.
158 _________________________________________________________________
162 To build SuSE RPM packages, install source as above. Then:
164 autoheader [suggested for CVS source]
165 autoconf [suggested for CVS source]
169 This will create both binary and src RPMs in the usual places.
172 /usr/src/packages/RPMS/i686/junkbuster-2.9.10-1.i686.rpm
174 /usr/src/packages/SRPMS/junkbuster-2.9.10-1.src.rpm
176 To install, of course:
178 rpm -Uvv /usr/src/packages/RPMS/i686/junkbuster-2.9.10-1.i686.rpm
180 This will place the Junkbuster configuration files in
181 /etc/junkbuster/, and log files in /var/log/junkbuster/.
182 _________________________________________________________________
186 The OS/2 version of Junkbuster requires the EMX runtime library to be
187 installed. The EMX runtime library is available on the hobbes OS/2
188 archive, among many other locations:
189 [28]http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=emx
190 rt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
192 Junkbuster is packaged in a WarpIN self- installing archive. The
193 self-installing program will be named depending on the release
194 version, something like: ijbos123.exe. In order to install it, simply
195 run this executable or double-click on its icon and follow the WarpIN
196 installation panels. A shadow of the Junkbuster executable will be
197 placed in your startup folder so it will start automatically whenever
200 The directory you choose to install Junkbuster into will contain all
201 of the configuration files.
203 If you would like to build binary images on OS/2 yourself, you will
204 need a working EMX/GCC environment, plus several Unix-like tools. The
205 Hobbes OS/2 archive is a good place to start when building such an
206 environment. A set of Unix-like tools named gnupack is located here:
207 [29]http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&key=gnupack&stype=all
208 &sort=type&dir=%2Fpub%2Fos2%2Fapps
210 Once you have the source code unpacked as above, you can build the
211 binaries from the current/ directory:
216 _________________________________________________________________
220 Click-click. (I need help on this. Not a clue here. Also for
221 configuration section below. HB.)
222 _________________________________________________________________
226 Some quick notes on other Operating Systems.
228 For FreeBSD (and other *BSDs?), the build will require gmake instead
229 of the included make. gmake is available from [30]http://www.gnu.org.
230 The rest should be the same as above for Linux/Unix.
231 _________________________________________________________________
233 3. Junkbuster Configuration
235 For Unix, *BSD and Linux, all configuraton files are located in
236 /etc/junkbuster/ by default. For MS Windows and OS/2, these are all in
237 the same directory as the Junkbuster executable. The name and number
238 of configuration files has changed from previous versions, and is
239 subject to change as development progresses.
241 The installed defaults provide a reasonable starting point, though
242 possibly aggressive by some standards. For the time being, there are
243 only three default configuration files (this will change in time):
245 * The main configuration file is named config on Linux, Unix, BSD,
246 and OS/2, and config.txt on Windows. On Amiga, it is
247 AmiTCP:db/junkbuster/config.
248 * The ijb.action file is used to define various "actions" relating
249 to images, banners, pop-ups, access restrictions, banners and
250 cookies. There is a CGI based editor for this file that can be
251 accessed via [31]http://i.j.b. This is the easiest method of
252 configuring actions. (Still under active development. Other
253 actions files are included as well with differing levels of
254 filtering and blocking, e.g. ijb-basic.action.)
255 * The re_filterfile file can be used to rewrite the raw page
256 content, including text as well as embedded HTML and JavaScript.
258 ijb.action and re_filterfile can use Perl style regular expressions
259 for maximum flexibility. All files use the "#" character to denote a
260 comment. Such lines are not processed by Junkbuster. After making any
261 changes, there is no need to restart Junkbuster in order for the
262 changes to take effect. Junkbuster should detect such changes
265 While under development, the configuration content is subject to
266 change. The below documentation may not be accurate by the time you
267 read this. Also, what constitutes a "default" setting, may change, so
268 please check all your configuration files on important issues.
269 _________________________________________________________________
271 3.1. The Main Configuration File
273 Again, the main configuration file is named config on Linux/Unix/BSD
274 and OS/2, and config.txt on Windows. Configuration lines consist of an
275 initial keyword followed by a list of values, all separated by
276 whitespace (any number of spaces or tabs). For example:
278 blockfile blocklist.ini
280 Indicates that the blockfile is named "blocklist.ini".
282 A "#" indicates a comment. Any part of a line following a "#" is
283 ignored, except if the "#" is preceded by a "\".
285 Thus, by placing a "#" at the start of an existing configuration line,
286 you can make it a comment and it will be treated as if it weren't
287 there. This is called "commenting out" an option and can be useful to
288 turn off features: If you comment out the "logfile" line, junkbuster
289 will not log to a file at all. Watch for the "default:" section in
290 each explanation to see what happens if the option is left unset (or
293 Long lines can be continued on the next line by using a "\" as the
296 There are various aspects of Junkbuster behavior that can be tuned.
297 _________________________________________________________________
299 3.1.1. Defining Other Configuration Files
301 Junkbuster can use a number of other files to tell it what ads to
302 block, what cookies to accept, etc. This section of the configuration
303 file tells Junkbuster where to find all those other files.
305 On Windows, Junkbuster looks for these files in the same directory as
306 the executable. On Unix and OS/2, Junkbuster looks for these files in
307 the current working directory. In either case, an absolute path name
308 can be used to avoid problems.
310 When development goes modular and multiuser, the blocker, filter, and
311 per-user config will be stored in subdirectories of "confdir". For
312 now, only confdir/templates is used for storing HTML templates for CGI
315 The location of the configuration files:
317 confdir /etc/junkbuster # No trailing /, please.
319 The directory where all logging (i.e. logfile and jarfile) takes
320 place. No trailing "/", please:
322 logdir /var/log/junkbuster
324 Note that all file specifications below are relative to the above two
327 The "ijb.action" file contains patterns to specify the actions to
328 apply to requests for each site. Default: Cookies to and from all
329 destinations are kept only during the current browser session (i.e.
330 they are not saved to disk). Popups are disabled for all sites. All
331 sites are filtered if "re_filterfile" specified. No sites are blocked.
332 An empty image is displayed for filtered ads and other images
333 (formerly "tinygif"). The syntax of this file is explained in detail
336 actionsfile ijb.action
338 The "re_filterfile" file contains content modification rules. These
339 rules permit powerful changes on the content of Web pages, e.g., you
340 could disable your favourite JavaScript annoyances, rewrite the actual
341 content, or just have some fun replacing "Microsoft" with "MicroSuck"
342 wherever it appears on a Web page. Default: No content modification,
343 or whatever the developers are playing with :-/
345 re_filterfile re_filterfile
347 The logfile is where all logging and error messages are written. The
348 logfile can be useful for tracking down a problem with Junkbuster
349 (e.g., it's not blocking an ad you think it should block) but in most
350 cases you probably will never look at it.
352 Your logfile will grow indefinitely, and you will probably want to
353 periodically remove it. On Unix systems, you can do this with a cron
354 job (see "man cron"). For Redhat, a logrotate script has been
357 On SuSE Linux systems, you can place a line like
358 "/var/log/junkbuster.* +1024k 644 nobody.nogroup" in /etc/logfiles,
359 with the effect that cron.daily will automatically archive, gzip, and
360 empty the log, when it exceeds 1M size.
362 Default: Log to the a file named logfile. Comment out to disable
367 The "jarfile" defines where Junkbuster stores the cookies it
368 intercepts. Note that if you use a "jarfile", it may grow quite large.
369 Default: Don't store intercepted cookies.
373 If you specify a "trustfile", Junkbuster will only allow access to
374 sites that are named in the trustfile. You can also mark sites as
375 trusted referrers, with the effect that access to untrusted sites will
376 be granted, if a link from a trusted referrer was used. The link
377 target will then be added to the "trustfile". This is a very
378 restrictive feature that typical users most propably want to leave
379 disabled. Default: Disabled, don't use the trust mechanism.
383 If you use the trust mechanism, it is a good idea to write up some
384 online documentation about your blocking policy and to specify the
385 URL(s) here. They will appear on the page that your users receive when
386 they try to access untrusted content. Use multiple times for multiple
387 URLs. Default: Don't display links on the "untrusted" info page.
389 trust-info-url http://www.your-site.com/why_we_block.html
390 trust-info-url http://www.your-site.com/what_we_allow.html
391 _________________________________________________________________
393 3.1.2. Other Configuration Options
395 This part of the configuration file contains options that control how
398 "Admin-address" should be set to the email address of the proxy
399 administrator. It is used in many of the proxy-generated pages.
400 Default: fill@me.in.please.
402 #admin-address fill@me.in.please
404 "Proxy-info-url" can be set to a URL that contains more info about
405 this Junkbuster installation, it's configuration and policies. It is
406 used in many of the proxy-generated pages and its use is highly
407 recommended in multi-user installations, since your users will want to
408 know why certain content is blocked or modified. Default: Don't show a
409 link to online documentation.
411 proxy-info-url http://www.your-site.com/proxy.html
413 "Listen-address" specifies the address and port where Junkbuster will
414 listen for connections from your Web browser. The default is to listen
415 on the localhost port 8000, and this is suitable for most users. (In
416 your web browser, under proxy configuration, list the proxy server as
417 "localhost" and the port as "8000").
419 If you already have another service running on port 8000, or if you
420 want to serve requests from other machines (e.g. on your local
421 network) as well, you will need to override the default. The syntax is
422 "listen-address [<ip-address>]:<port>". If you leave out the IP
423 address, junkbuster will bind to all interfaces (addresses) on your
424 machine and may become reachable from the Internet. In that case,
425 consider using access control lists (acl's) (see "aclfile" above), or
428 For example, suppose you are running Junkbuster on a machine which has
429 the address 192.168.0.1 on your local private network (192.168.0.0)
430 and has another outside connection with a different address. You want
431 it to serve requests from inside only:
433 listen-address 192.168.0.1:8000
435 If you want it to listen on all addresses (including the outside
440 If you do this, consider using ACLs (see "aclfile" above). Note: you
441 will need to point your browser(s) to the address and port that you
442 have configured here. Default: localhost:8000 (127.0.0.1:8000).
444 The debug option sets the level of debugging information to log in the
445 logfile (and to the console in the Windows version). A debug level of
446 1 is informative because it will show you each request as it happens.
447 Higher levels of debug are probably only of interest to developers.
449 debug 1 # GPC = show each GET/POST/CONNECT request
450 debug 2 # CONN = show each connection status
451 debug 4 # IO = show I/O status
452 debug 8 # HDR = show header parsing
453 debug 16 # LOG = log all data into the logfile
454 debug 32 # FRC = debug force feature
455 debug 64 # REF = debug regular expression filter
456 debug 128 # = debug fast redirects
457 debug 256 # = debug GIF deanimation
458 debug 512 # CLF = Common Log Format
459 debug 1024 # = debug kill popups
460 debug 4096 # INFO = Startup banner and warnings.
461 debug 8192 # ERROR = Non-fatal errors
463 It is highly recommended that you enable ERROR reporting (debug 8192),
464 at least until the next stable release.
466 The reporting of FATAL errors (i.e. ones which crash JunkBuster) is
467 always on and cannot be disabled.
469 If you want to use CLF (Common Log Format), you should set "debug 512"
470 ONLY, do not enable anything else.
472 Multiple "debug" directives, are OK - they're logical-OR'd together.
474 debug 15 # same as setting the first 4 listed above
480 debug 8192 # Errors - *we highly recommended enabling this*
482 Junkbuster normally uses "multi-threading", a software technique that
483 permits it to handle many different requests simultaneously. In some
484 cases you may wish to disable this -- particularly if you're trying to
485 debug a problem. The "single-threaded" option forces Junkbuster to
486 handle requests sequentially. Default: Multi-threaded mode.
490 "toggle" allows you to temporarily disable all Junkbuster's filtering.
493 The Windows version of Junkbuster puts an icon in the system tray,
494 which also allows you to change this option. If you right-click on
495 that icon (or select the "Options" menu), one choice is "Enable".
496 Clicking on enable toggles Junkbuster on and off. This is useful if
497 you want to temporarily disable Junkbuster, e.g., to access a site
498 that requires cookies which you would otherwise have blocked. This can
499 also be toggled via a web browser at the Junkbuster internal address
500 of [33]http://i.j.b on any platform.
502 "toggle 1" means Junkbuster runs normally, "toggle 0" means that
503 Junkbuster becomes a non-anonymizing non-blocking proxy. Default: 1
508 For content filtering, i.e. the "+filter" and "+deanimate-gif"
509 actions, it is neccessary that Junkbuster buffers the entire document
510 body. This can be potentially dangerous, since a server could just
511 keep sending data indefinitely and wait for your RAM to exhaust. With
514 The buffer-limit option lets you set the maximum size in Kbytes that
515 each buffer may use. When the documents buffer exceeds this size, it
516 is flushed to the client unfiltered and no further attempt to filter
517 the rest of it is made. Remember that there may multiple threads
518 running, which might require increasing the "buffer-limit" Kbytes
519 each, unless you have enabled "single-threaded" above.
523 To enable the web-based ijb.action file editor set enable-edit-actions
524 to 1, or 0 to disable. Note that you must have compiled JunkBuster
525 with support for this feature, otherwise this option has no effect.
526 This internal page can be reached at [34]http://i.j.b.
528 Security note: If this is enabled, anyone who can use the proxy can
529 edit the actions file, and their changes will affect all users. For
530 shared proxies, you probably want to disable this. Default: enabled.
532 enable-edit-actions 1
534 Allow JunkBuster to be toggled on and off remotely, using your web
535 browser. Set "enable-remote-toggle"to 1 to enable, and 0 to disable.
536 Note that you must have compiled JunkBuster with support for this
537 feature, otherwise this option has no effect.
539 Security note: If this is enabled, anyone who can use the proxy can
540 toggle it on or off (see [35]http://i.j.b), and their changes will
541 affect all users. For shared proxies, you probably want to disable
542 this. Default: enabled.
544 enable-remote-toggle 1
545 _________________________________________________________________
547 3.1.3. Access Control List (ACL)
549 Access controls are included at the request of some ISPs and systems
550 administrators, and are not usually needed by individual users. Please
551 note the warnings in the FAQ that this proxy is not intended to be a
552 substitute for a firewall or to encourage anyone to defer addressing
553 basic security weaknesses.
555 If no access settings are specified, the proxy talks to anyone that
556 connects. If any access settings file are specified, then the proxy
557 talks only to IP addresses permitted somewhere in this file and not
558 denied later in this file.
560 Summary -- if using an ACL:
562 Client must have permission to receive service.
564 LAST match in ACL wins.
566 Default behavior is to deny service.
568 The syntax for an entry in the Access Control List is:
570 ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ]
572 Where the individual fields are:
574 ACTION = "permit-access" or "deny-access"
575 SRC_ADDR = client hostname or dotted IP address
576 SRC_MASKLEN = number of bits in the subnet mask for the source
577 DST_ADDR = server or forwarder hostname or dotted IP address
578 DST_MASKLEN = number of bits in the subnet mask for the target
580 The field separator (FS) is whitespace (space or tab).
582 IMPORTANT NOTE: If the junkbuster is using a forwarder (see below) or
583 a gateway for a particular destination URL, the DST_ADDR that is
584 examined is the address of the forwarder or the gateway and NOT the
585 address of the ultimate target. This is necessary because it may be
586 impossible for the local Junkbuster to determine the address of the
587 ultimate target (that's often what gateways are used for).
589 Here are a few examples to show how the ACL features work:
591 "localhost" is OK -- no DST_ADDR implies that ALL destination
594 permit-access localhost
596 A silly example to illustrate permitting any host on the class-C
597 subnet with Junkbuster to go anywhere:
599 permit-access www.junkbusters.com/24
601 Except deny one particular IP address from using it at all:
603 deny-access ident.junkbusters.com
605 You can also specify an explicit network address and subnet mask.
606 Explicit addresses do not have to be resolved to be used.
608 permit-access 207.153.200.0/24
610 A subnet mask of 0 matches anything, so the next line permits
613 permit-access 0.0.0.0/0
615 Note, you cannot say:
619 to allow all *.org domains. Every IP address listed must resolve
622 An ISP may want to provide a Junkbuster that is accessible by "the
623 world" and yet restrict use of some of their private content to hosts
624 on its internal network (i.e. its own subscribers). Say, for instance
625 the ISP owns the Class-B IP address block 123.124.0.0 (a 16 bit
626 netmask). This is how they could do it:
628 permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere
629 # with the following exceptions
632 deny-access 0.0.0.0/0 123.124.0.0/16 # block all external request
634 # sites on the ISP's network
635 permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main
637 permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go
640 Note that if some hostnames are listed with multiple IP addresses, the
641 primary value returned by DNS (via gethostbyname()) is used. Default:
642 Anyone can access the proxy.
643 _________________________________________________________________
647 This feature allows chaining of HTTP requests via multiple proxies. It
648 can be used to better protect privacy and confidentiality when
649 accessing specific domains by routing requests to those domains to a
650 special purpose filtering proxy such as lpwa.com. Or to use a caching
651 proxy to speed up browsing.
653 It can also be used in an environment with multiple networks to route
654 requests via multiple gateways allowing transparent access to multiple
655 networks without having to modify browser configurations.
657 Also specified here are SOCKS proxies. Junkbuster SOCKS 4 and SOCKS
658 4A. The difference is that SOCKS 4A will resolve the target hostname
659 using DNS on the SOCKS server, not our local DNS client.
661 The syntax of each line is:
663 forward target_domain[:port] http_proxy_host[:port]
664 forward-socks4 target_domain[:port] socks_proxy_host[:port]
665 http_proxy_host[:port]
666 forward-socks4a target_domain[:port] socks_proxy_host[:port]
667 http_proxy_host[:port]
669 If http_proxy_host is ".", then requests are not forwarded to a HTTP
670 proxy but are made directly to the web servers.
672 Lines are checked in sequence, and the last match wins.
674 There is an implicit line equivalent to the following, which specifies
675 that anything not finding a match on the list is to go out without
676 forwarding or gateway protocol, like so:
678 forward .* . # implicit
680 In the following common configuration, everything goes to Lucent's
681 LPWA, except SSL on port 443 (which it doesn't handle):
683 forward .* lpwa.com:8000
686 See the FAQ for instructions on how to automate the login procedure
687 for LPWA. Some users have reported difficulties related to LPWA's use
688 of "." as the last element of the domain, and have said that this can
691 forward lpwa. lpwa.com:8000
693 (NOTE: the syntax for specifiying target_domain has changed since the
694 previous paragraph was written -- it will not work now. More
695 information is welcome.)
697 In this fictitious example, everything goes via an ISP's caching
698 proxy, except requests to that ISP:
700 forward .* caching.myisp.net:8000
703 For the @home network, we're told the forwarding configuration is
706 forward .* proxy:8080
708 Also, we're told they insist on getting cookies and JavaScript, so you
709 should add home.com to the cookie file. We consider JavaScript a
710 security risk. Java need not be enabled.
712 In this example direct connections are made to all "internal" domains,
713 but everything else goes through Lucent's LPWA by way of the company's
714 SOCKS gateway to the Internet.
716 forward-socks4 .* lpwa.com:8000 firewall.my_company.com:1080
717 forward my_company.com .
719 This is how you could set up a site that always uses SOCKS but no
722 forward-socks4a .* . firewall.my_company.com:1080
724 An advanced example for network administrators:
726 If you have links to multiple ISPs that provide various special
727 content to their subscribers, you can configure forwarding to pass
728 requests to the specific host that's connected to that ISP so that
729 everybody can see all of the content on all of the ISPs.
731 This is a bit tricky, but here's an example:
733 host-a has a PPP connection to isp-a.com. And host-b has a PPP
734 connection to isp-b.com. host-a can run a Junkbuster proxy with
735 forwarding like this:
738 forward isp-b.com host-b:8000
740 host-b can run a Junkbuster proxy with forwarding like this:
743 forward isp-a.com host-a:8000
745 Now, anyone on the Internet (including users on host-a and host-b) can
746 set their browser's proxy to either host-a or host-b and be able to
747 browse the content on isp-a or isp-b.
749 Here's another practical example, for University of Kent at Canterbury
750 students with a network connection in their room, who need to use the
751 University's Squid web cache.
753 forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for:
754 forward .ukc.ac.uk . # Anything on the same domain as us
755 forward * . # Host with no domain specified
756 forward 129.12.*.* . # A dotted IP on our /16 network.
757 forward 127.*.*.* . # Loopback address
758 forward localhost.localdomain . # Loopback address
759 forward www.ukc.mirror.ac.uk . # Specific host
761 If you intend to chain Junkbuster and squid locally, then chain as
762 browser -> squid -> junkbuster is the recommended way.
764 Your squid configuration could then look like this:
766 # Define junkbuster as parent cache
768 cache_peer 127.0.0.1 parent 8000 0 no-query
770 # Define ACL for protocol FTP
772 # Do not forward ACL FTP to junkbuster
773 always_direct allow FTP
774 # Do not forward ACL CONNECT (https) to junkbuster
775 always_direct allow CONNECT
776 # Forward the rest to junkbuster
777 never_direct allow all
778 _________________________________________________________________
780 3.1.5. Windows GUI Options
782 Junkbuster has a number of options specific to the Windows GUI
785 If "activity-animation" is set to 1, the Junkbuster icon will animate
786 when "Junkbuster" is active. To turn off, set to 0.
790 If "log-messages" is set to 1, Junkbuster will log messages to the
795 If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the
796 amount of memory used for the log messages displayed in the console
797 window, will be limited to "log-max-lines" (see below).
799 Warning: Setting this to 0 will result in the buffer to grow
800 infinitely and eat up all your memory!
804 log-max-lines is the maximum number of lines held in the log buffer.
809 If "log-highlight-messages" is set to 1, Junkbuster will highlight
810 portions of the log messages with a bold-faced font:
812 log-highlight-messages 1
814 The font used in the console window:
816 log-font-name Comic Sans MS
818 Font size used in the console window:
822 "show-on-task-bar" controls whether or not Junkbuster will appear as a
823 button on the Task bar when minimized:
827 If "close-button-minimizes" is set to 1, the Windows close button will
828 minimize Junkbuster instead of closing the program (close with the
829 exit option on the File menu).
831 close-button-minimizes 1
833 The "hide-console" option is specific to the MS-Win console version of
834 JunkBuster. If this option is used, Junkbuster will disconnect from
835 and hide the command console.
838 _________________________________________________________________
840 3.2. The Actions File
842 The "ijb.action" file (formerly actionsfile) is used to define what
843 actions Junkbuster takes, and thus determines how images, cookies and
844 various other aspects of HTTP content and transactions are handled.
845 Images can be anything you want, including ads, banners, or just some
846 obnoxious image that you would rather not see. Cookies can be accepted
847 or rejected, or accepted only during the current browser session (i.e.
848 not written to disk). Changes to ijb.action should be immediately
849 visible to Junkbuster without the need to restart.
851 To determine which actions apply to a request, the URL of the request
852 is compared to all patterns in this file. Every time it matches, the
853 list of applicable actions for the URL is incrementally updated. You
854 can trace this process by visiting [36]http://i.j.b/show-url-info.
856 The actions file can be edited with a browser by loading
857 [37]http://i.j.b/, and then select "Edit Actions".
859 There are four types of lines in this file: comments (begin with a "#"
860 character), actions, aliases and patterns, all of which are explained
861 below, as well as the configuration file syntax that Junkbuster
863 _________________________________________________________________
865 3.2.1. URL Domain and Path Syntax
867 Generally, a pattern has the form <domain>/<path>, where both the
868 <domain> and <path> part are optional. If you only specify a domain
869 part, the "/" can be left out:
871 www.example.com - is a domain only pattern and will match any request
872 to "www.example.com".
874 www.example.com/ - means exactly the same.
876 www.example.com/index.html - matches only the single document
877 "/index.html" on "www.example.com".
879 /index.html - matches the document "/index.html", regardless of the
882 index.html - matches nothing, since it would be interpreted as a
883 domain name and there is no top-level domain called ".html".
885 The matching of the domain part offers some flexible options: if the
886 domain starts or ends with a dot, it becomes unanchored at that end.
889 .example.com - matches any domain that ENDS in ".example.com".
891 www. - matches any domain that STARTS with "www".
893 Additionally, there are wildcards that you can use in the domain names
894 themselves. They work pretty similar to shell wildcards: "*" stands
895 for zero or more arbitrary characters, "?" stands for any single
896 character. And you can define charachter classes in square brackets
897 and they can be freely mixed:
899 ad*.example.com - matches "adserver.example.com", "ads.example.com",
900 etc but not "sfads.example.com".
902 *ad*.example.com - matches all of the above, and then some.
904 .?pix.com - matches "www.ipix.com", "pictures.epix.com",
905 "a.b.c.d.e.upix.com", etc.
907 www[1-9a-ez].example.com - matches "www1.example.com",
908 "www4.example.com", "wwwd.example.com", "wwwz.example.com", etc., but
909 not "wwww.example.com".
911 If Junkbuster was compiled with "pcre" support (default), Perl
912 compatible regular expressions can be used. See the pcre/docs/
913 direcory or "man perlre" (also available on
914 [38]http://www.perldoc.com/perl5.6/pod/perlre.html) for details. A
915 brief discussion of regular expressions is in the [39]Appendix. For
918 /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any
919 path that includes "advert" followed immediately by one or more
920 digits, then a "." and ending in either "jpeg" or "jpg". So we match
921 "example.com/ads/advert2.jpg", and
922 "www.example.com/ads/banners/advert39.jpeg", but not
923 "www.example.com/ads/banners/advert39.gif" (no gifs in the example
926 Please note that matching in the path is case INSENSITIVE by default,
927 but you can switch to case sensitive at any point in the pattern by
928 using the "(?-i)" switch:
930 www.example.com/(?-i)PaTtErN.* - will match only documents whose path
931 starts with "PaTtErN" in exactly this capitalization.
932 _________________________________________________________________
936 Actions are enabled if preceded with a "+", and disabled if preceded
937 with a "-". Actions are invoked by enclosing the action name in curly
938 braces (e.g. {+some_action}), followed by a list of URLs to which the
939 action applies. There are three classes of actions:
941 * Boolean (e.g. "+/-block"):
942 {+name} # enable this action
943 {-name} # disable this action
945 * parameterized (e.g. "+/-hide-user-agent"):
946 {+name{param}} # enable action and set parameter to "param"
947 {-name} # disable action
949 * Multi-value (e.g. "{+/-add-header{Name: value}}",
950 "{+/-wafer{name=value}}"):
951 {+name{param}} # enable action and add parameter "param"
952 {-name{param}} # remove the parameter "param"
953 {-name} # disable this action totally
955 If nothing is specified in this file, no "actions" are taken. So in
956 this case JunkBuster would just be a normal, non-blocking,
957 non-anonymizing proxy. You must specifically enable the privacy and
958 blocking features you need (although the provided default ijb.action
959 file will give a good starting point).
961 Later defined actions always over-ride earlier ones. For multi-valued
962 actions, the actions are applied in the order they are specified.
964 The list of valid Junkbuster "actions" are:
966 * Add the specified HTTP header, which is not checked for validity.
967 You may specify this many times to specify many different headers:
968 +add-header{Name: value}
970 * Block this URL totally.
973 * De-animate all animated GIF images, i.e. reduce them to their last
974 frame. This will also shrink the images considerably (in bytes,
975 not pixels!). If the option "first" is given, the first frame of
976 the animation is used as the replacement. If "last" is given, the
977 last frame of the animation is used instead, which propably makes
978 more sense for most banner animations, but also has the risk of
979 not showing the entire last frame (if it is only a delta to an
981 +deanimate-gifs{last}
982 +deanimate-gifs{first}
984 * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0
985 and downgrade the responses as well. Use this action for servers
986 that use HTTP/1.1 protocol features that Junkbuster doesn't handle
987 well yet. HTTP/1.1 is only partially implemented. Default is not
988 to downgrade requests.
991 * Many sites, like yahoo.com, don't just link to other sites.
992 Instead, they will link to some script on their own server, giving
993 the destination as a parameter, which will then redirect you to
994 the final target. URLs resulting from this scheme typically look
995 like: http://some.place/some_script?http://some.where-else.
996 Sometimes, there are even multiple consecutive redirects encoded
997 in the URL. These redirections via scripts make your web browing
998 more traceable, since the server from which you follow such a link
999 can see where you go to. Apart from that, valuable bandwidth and
1000 time is wasted, while your browser ask the server for one redirect
1001 after the other. Plus, it feeds the advertisers.
1002 The "+fast-redirects" option enables interception of these
1003 requests by Junkbuster, who will cut off all but the last valid
1004 URL in the request and send a local redirect back to your browser
1005 without contacting the remote site.
1008 * Filter the website through the re_filterfile:
1011 * Block any existing X-Forwarded-for header, and do not add a new
1015 * If the browser sends a "From:" header containing your e-mail
1016 address, this either completely removes the header ("block"), or
1017 changes it to the specified e-mail address.
1019 +hide-from{spam@sittingduck.xqq}
1021 * Don't send the "Referer:" (sic) header to the web site. You can
1022 block it, forge a URL to the same server as the request (which is
1023 preferred because some sites will not send images otherwise) or
1024 set it to a constant string of your choice.
1025 +hide-referer{block}
1026 +hide-referer{forge}
1027 +hide-referer{http://nowhere.com}
1029 * Alternative spelling of "+hide-referer". It has the same
1030 parameters, and can be freely mixed with, "+hide-referer".
1031 ("referrer" is the correct English spelling, however the HTTP
1032 specification has a bug - it requires it to be spelled "referer".)
1035 * Change the "User-Agent:" header so web servers can't tell your
1036 browser type. Warning! This breaks many web sites. Specify the
1037 user-agent value you want. Example, pretend to be using Netscape
1039 +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)}
1041 * Treat this URL as an image. This only matters if it's also
1042 "+block"ed, in which case a "blocked" image can be sent rather
1043 than a HTML page. See "+image-blocker{}" below for the control
1044 over what is actually sent.
1047 * Decides what to do with URLs that end up tagged with "{+block
1048 +image}". There are 4 options. "-image-blocker" will send a HTML
1049 "blocked" page, usually resulting in a "broken image" icon.
1050 "+image-blocker{logo}" will send a "JunkBuster" image.
1051 "+image-blocker{blank}" will send a 1x1 transparent GIF image. And
1052 finally, "+image-blocker{http://xyz.com}" will send a HTTP
1053 temporary redirect to the specified image. This has the advantage
1054 of the icon being being cached by the browser, which will speed up
1056 +image-blocker{logo}
1057 +image-blocker{blank}
1058 +image-blocker{http://i.j.b/send-banner}
1060 * By default (i.e. in the absence of a "+limit-connect" action),
1061 Junkbuster will only allow CONNECT requests to port 443, which is
1062 the standard port for https as a precaution.
1063 The CONNECT methods exists in HTTP to allow access to secure
1064 websites (https:// URLs) through proxies. It works very simply:
1065 the proxy connects to the server on the specified port, and then
1066 short-circuits its connections to the client and to the remote
1067 proxy. This can be a big security hole, since CONNECT-enabled
1068 proxies can be abused as TCP relays very easily.
1069 If you want to allow CONNECT for more ports than this, or want to
1070 forbid CONNECT altogether, you can specify a comma separated list
1071 of ports and port ranges (the latter using dashes, with the
1072 minimum defaulting to 0 and max to 65K):
1073 +limit-connect{443} # This is the default and need no be
1075 +limit-connect{80,443} # Ports 80 and 443 are OK.
1076 +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to
1078 #and above 500 are OK.
1080 * "+no-compression" prevents the website from compressing the data.
1081 Some websites do this, which can be a problem for Junkbuster,
1082 since "+filter", "+no-popup" and "+gif-deanimate" will not work on
1083 compressed data. This will slow down connections to those
1084 websites, though. Default is "nocompression" is turned on.
1087 * If the website sets cookies, "no-cookies-keep" will make sure they
1088 are erased when you exit and restart your web browser. This makes
1089 profiling cookies useless, but won't break sites which require
1090 cookies so that you can log in for transactions. Default: on.
1093 * Prevent the website from reading cookies:
1096 * Prevent the website from setting cookies:
1099 * Filter the website through a built-in filter to disable those
1100 obnoxious JavaScript pop-up windows via window.open(), etc. The
1101 two alternative spellings are equivalent.
1105 * This action only applies if you are using a jarfile for saving
1106 cookies. It sends a cookie to every site stating that you do not
1107 accept any copyright on cookies sent to you, and asking them not
1108 to track you. Of course, this is a (relatively) unique header they
1109 could use to track you.
1112 * This allows you to add an arbitrary cookie. It can be specified
1113 multiple times in order to add as many cookies as you like.
1116 The meaning of any of the above is reversed by preceding the action
1117 with a "-", in place of the "+".
1121 Turn off cookies by default, then allow a few through for specified
1124 # Turn off all persistant cookies
1125 { +no-cookies-read }
1127 # Allow cookies for this browser session ONLY
1128 { +no-cookies-keep }
1129 # Execeptions to the above, sites that benefit from persistant cookie
1131 { -no-cookies-read }
1133 { -no-cookies-keep }
1139 # Alternative way of saying the same thing
1140 {-no-cookies-set -no-cookies-read -no-cookies-keep}
1144 Now turn off "fast redirects", and then we allow two exceptions:
1149 # Reverse it for these two sites, which don't work right without it.
1151 www.ukc.ac.uk/cgi-bin/wac\.cgi\?
1154 Turn on page filtering, with one exception for sourceforge:
1156 # Run everything through the default filter file (re_filterfile):
1159 # But please don't re_filter code from sourceforge!
1161 .cvs.sourceforge.net
1163 Now some URLs that we want "blocked", ie we won't see them. Many of
1164 these use regular expressions that will expand to match multiple URLs:
1168 /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g))
1169 /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/])
1170 /.*/(ng)?adclient\.cgi
1171 /.*/(plain|live|rotate)[-_.]?ads?/
1172 /.*/(sponsor)s?[0-9]?/
1173 /.*/_?(plain|live)?ads?(-banners)?/
1175 /.*/ad(sdna_image|gifs?)/
1176 /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe)
1180 /.*/adv((er)?ts?|ertis(ing|ements?))?/
1184 /.*/cgi-bin/centralad/getimage
1185 /.*/images/addver\.gif
1186 /.*/images/marketing/.*\.(gif|jpe?g)
1190 /.*/sponsors?[0-9]?/
1191 /.*/advert[0-9]+\.jpg
1198 /graphics/defaultAd/
1200 /image\.ng/transactionID
1201 /images/.*/.*_anim\.gif # alvin brattli
1202 /ip_img/.*\.(gif|jpe?g)
1206 /cgi-bin/nph-adclick.exe/
1207 /.*/Image/BannerAdvertising/
1209 /.*/adlib/server\.cgi
1211 _________________________________________________________________
1215 Custom "actions", known to Junkbuster as "aliases", can be defined by
1216 combining other "actions". These can in turn be invoked just like the
1217 built-in "actions". Currently, an alias can contain any character
1218 except space, tab, "=", "{" or "}". But please use only "a"- "z",
1219 "0"-"9", "+", and "-". Alias names are not case sensitive, and must be
1220 defined before anything else in the ijb.actionfile ! And there can
1221 only be one set of "aliases" defined.
1223 Now let's define a few aliases:
1225 # Useful customer aliases we can use later. These must come first!
1227 +no-cookies = +no-cookies-set +no-cookies-read
1228 -no-cookies = -no-cookies-set -no-cookies-read
1229 fragile = -block -no-cookies -filter -fast-redirects -hide-refere
1231 shop = -no-cookies -filter -fast-redirects
1232 +imageblock = +block +image
1233 #For people who don't like to type too much: ;-)
1236 c2 = -no-cookies-set +no-cookies-read
1237 c3 = +no-cookies-set -no-cookies-read
1238 #... etc. Customize to your heart's content.
1240 Some examples using our "shop" and "fragile" aliases from above:
1242 # These sites are very complex and require
1243 # minimal interference.
1245 .office.microsoft.com
1246 .windowsupdate.microsoft.com
1248 # Shopping sites - still want to block ads.
1251 .worldpay.com # for quietpc.com
1254 # These shops require pop-ups
1258 _________________________________________________________________
1260 3.3. The Filter File
1262 The filter file defines what filtering of web pages Junkbuster does.
1263 The default filter file is re_filterfile, located in the config
1264 directory. In this file, any document content, whether viewable text
1265 or embedded non-visible content, can be changed.
1267 This file uses regular expressions to alter or remove any string in
1268 the target page. Some examples from the included default
1271 Stop web pages from displaying annoying messages in the status bar by
1272 deleting such references:
1274 # The status bar is for displaying link targets, not pointless buzzwo
1276 # Again, check it out on http://www.airport-cgn.de/.
1277 s/status='.*?';*//ig
1279 Just for kicks, replace any occurrence of "Microsoft" with
1282 s/microsoft(?!.com)/MicroSuck/ig
1284 Kill those auto-refresh tags:
1286 # Kill refresh tags. I like to refresh myself. Manually.
1287 # check it out on http://www.airport-cgn.de/ and go to the arrivals p
1290 s/<meta[^>]*http-equiv[^>]*refresh.*URL=([^>]*?)"?>/<link rev="x-refr
1292 s/<meta[^>]*http-equiv="?page-enter"?[^>]*content=[^>]*>/<!--no page
1294 _________________________________________________________________
1298 When Junkbuster displays one of its internal pages, such as a 404 Not
1299 Found error page, it uses the appropriate template. On Linux, BSD, and
1300 Unix, these are locate in /etc/junkbuster/templates by default. These
1301 may be customized, if desired.
1302 _________________________________________________________________
1304 4. Quickstart to Using Junkbuster
1306 Install package, then run and enjoy! Junbuster accepts only one
1307 command line option -- the configuration file to be used. Example Unix
1311 # /usr/sbin/junkbuster /etc/junkbuster/config
1314 An init script is provided for SuSE and Redhat.
1316 For for SuSE: /etc/rc.d/junkbuster start
1318 For RedHat: /etc/rc.d/init.d/junkbuster start
1320 If no configuration file is specified on the command line, Junkbuster
1321 will look for a file named config in the current directory. Except on
1322 Amiga where it will look for AmiTCP:db/junkbuster/config and Win32
1323 where it will try config.txt. If no file is specified on the command
1324 line and no default configuration file can be found, Junkbuster will
1327 Be sure your browser is set to use the proxy which is by default at
1328 localhost, port 8000. With Netscape (and Mozilla), this can be set
1329 under Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy. For
1330 Internet Explorer: Tools > Internet Properties -> Connections -> LAN
1331 Setting. Then, check "Use Proxy" and fill in the appropriate info
1332 (Address: localhost, Port: 8000). Include if HTTPS proxy support too.
1334 The included default configuration files should give a reasonable
1335 starting point, though may be somewhat aggressive in blocking junk.
1336 You will probably want to keep an eye out for sites that require
1337 persistant cookies, and add these to ijb.action as needed. By default,
1338 most of these will be accepted only during the current browser
1339 session, until you add them to the configuration. If you want the
1340 browser to handle this instead, you will need to edit ijb.action and
1341 disable this feature. If you use more than one browser, it would make
1342 more sense to let Junkbuster handle this. In which case, the
1343 browser(s) should be set to accept all cookies.
1345 If a particular site shows problems loading properly, try adding it to
1346 the {fragile} section of ijb.action. This will turn off most actions
1349 HTTP/1.1 support is not fully implemented. If browsers that support
1350 HTTP/1.1 (like Mozilla or recent versions of I.E.) experience
1351 problems, you might try to force HTTP/1.0 compatiblity. For Mozilla,
1352 look under Edit -> Preferences -> Debug -> Networking. Or set the
1353 "+downgrade" config option in ijb.action.
1355 After running Junkbuster for a while, you can start to fine tune the
1356 configuration to suit your personal, or site, preferences and
1357 requirements. There are many, many aspects that can be customized.
1358 "Actions" (as specified in ijb.action) can be adjusted by pointing
1359 your browser to [40]http://i.j.b/, and then follow the link to "edit
1360 the actions list". (This is an internal page and does not require
1363 In fact, various aspects of Junkbuster configuration can be viewed
1364 from this page, including current configuration parameters, source
1365 code version numbers, the browser's request headers, and "actions"
1366 that apply to a given URL. In addition to the ijb.action file editor
1367 mentioned above, Junkbuster can also be turned "on" and "off" from
1370 If you encounter problems, please verify it is a Junkbuster bug, by
1371 disabling Junkbuster, and then trying the same page. Also, try another
1372 browser if possible to eliminate browser or site problems. Before
1373 reporting it as a bug, see if there is not a configuration option that
1374 is enabled that is causing the page not to load. You can then add an
1375 exception for that page or site. If a bug, please report it to the
1376 developers (see below).
1377 _________________________________________________________________
1379 5. Contact the Developers
1381 Feature requests and other questions should be posted to the
1382 [41]Feature request page at SourceForge. There is also an archive
1385 Anyone interested in actively participating in development and related
1386 discussions can join the appropriate mailing list [42]here. Archives
1387 are available here too.
1389 Please report bugs, using the form at [43]Sourceforge. Please try to
1390 verify that it is a Junkbuster bug, and not a browser or site bug
1391 first. Also, check to make sure this is not already a known bug.
1392 _________________________________________________________________
1394 6. Copyright and History
1398 Internet Junkbuster is free software; you can redistribute it and/or
1399 modify it under the terms of the GNU General Public License as
1400 published by the Free Software Foundation; either version 2 of the
1401 License, or (at your option) any later version.
1403 This program is distributed in the hope that it will be useful, but
1404 WITHOUT ANY WARRANTY; without even the implied warranty of
1405 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1406 General Public License for more details, which is available from
1407 [44]the Free Software Foundation, Inc, 59 Temple Place - Suite 330,
1408 Boston, MA 02111-1307, USA.
1409 _________________________________________________________________
1413 Junkbuster was originally written by Anonymous Coders and
1414 [45]JunkBusters Corporation, and was released as free open-source
1415 software under the GNU GPL. [46]Stefan Waldherr made many
1416 improvements, and started the [47]SourceForge project to rekindle
1417 development. The last stable release was v2.0.2, which has now grown
1419 _________________________________________________________________
1423 [48]http://sourceforge.net/projects/ijbswa
1425 [49]http://ijbswa.sourceforge.net/
1429 [51]http://www.junkbusters.com/ht/en/cookies.html
1431 [52]http://www.waldherr.org/junkbuster/
1433 [53]http://privacy.net/analyze/
1435 [54]http://www.squid-cache.org/
1436 _________________________________________________________________
1440 8.1. Regular Expressions
1442 Junkbuster can use "regular expressions" in various config files.
1443 Assuming support for "pcre" (Perl Compatible Regular Expressions) is
1444 compiled in, which is the default. Such configuration directives do
1445 not require regular expressions, but they can be used to increase
1446 flexibility by matching a pattern with wildcards against URLs.
1448 If you are reading this, you probably don't understand what "regular
1449 expressions" are, or what they can do. So this will be a very brief
1450 introduction only. A full explanation would require a book ;-)
1452 "Regular expressions" is a way of matching one character expression
1453 against another to see if it matches or not. One of the "expressions"
1454 is a literal string of readable characters (letter, numbers, etc), and
1455 the other is a complex string of literal characters combined with
1456 wildcards, and other special characters, called metacharacters. The
1457 "metacharacters" have special meanings and are used to build the
1458 complex pattern to be matched against. Perl Compatible Regular
1459 Expressions is an enhanced form of the regular expression language
1460 with backward compatibility.
1462 To make a simple analogy, we do something similar when we use wildcard
1463 characters when listing files with the dir command in DOS. *.* matches
1464 all filenames. The "special" character here is the asterik which
1465 matches any and all characters. We can be more specific and use ? to
1466 match just individual characters. So "dir file?.text" would match
1467 "file1.txt", "file2.txt", etc. We are pattern matching, using a
1468 similar technique to "regular expressions"!
1470 Regular expressions do essentially the same thing, but are much, much
1471 more powerful. There are many more "special characters" and ways of
1472 building complex patterns however. Let's look at a few of the common
1473 ones, and then some examples:
1475 . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
1477 ? - The preceding character or expression is matched ZERO or ONE
1480 + - The preceding character or expression is matched ONE or MORE
1483 * - The preceding character or expression is matched ZERO or MORE
1486 \ - The "escape" character denotes that the following character should
1487 be taken literally. This is used where one of the special characters
1488 (e.g. ".") needs to be taken literally and not as a special
1491 [] - Characters enclosed in brackets will be matched if any of the
1492 enclosed characters are encountered.
1494 () - pararentheses are used to group a sub-expression, or multiple
1497 | - The "bar" character works like an "or" conditional statement. A
1498 match is successful if the sub-expression on either side of "|"
1501 s/string1/string2/g - This is used to rewrite strings of text.
1502 "string1" is replaced by "string2" in this example.
1504 These are just some of the ones you are likely to use when matching
1505 URLs with Junkbuster, and is a long way from a definitive list. This
1506 is enough to get us started with a few simple examples which may be
1509 /.*/banners/.* - A simple example that uses the common combination of
1510 "." and "*" to denote any character, zero or more times. In other
1511 words, any string at all. So we start with a literal forward slash,
1512 then our regular expression pattern (".*") another literal forward
1513 slash, the string "banners", another forward slash, and lastly another
1514 ".*". We are building a directory path here. This will match any file
1515 with the path that has a directory named "banners" in it. The ".*"
1516 matches any characters, and this could conceivably be more forward
1517 slashes, so it might expand into a much longer looking path. For
1518 example, this could match:
1519 "/eye/hate/spammers/banners/annoy_me_please.gif", or just
1520 "/banners/annoying.html", or almost an infinite number of other
1521 possible combinations, just so it has "banners" in the path somewhere.
1523 A now something a little more complex:
1525 /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
1526 forward slashes again ("/"), so we are building another expression
1527 that is a file path statement. We have another ".*", so we are
1528 matching against any conceivable sub-path, just so it matches our
1529 expression. The only true literal that must match our pattern is adv,
1530 together with the forward slashes. What comes after the "adv" string
1531 is the interesting part.
1533 Remember the "?" means the preceding expression (either a literal
1534 character or anything grouped with "(...)" in this case) can exist or
1535 not, since this means either zero or one match. So
1536 "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
1537 sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
1538 "or". We have two of those. For instance, "(ing|ements?)", can expand
1539 to match either "ing" OR "ements?". What is being done here, is an
1540 attempt at matching as many variations of "advertisement", and
1541 similar, as possible. So this would expand to match just "adv", or
1542 "advert", or "adverts", or "advertising", or "advertisement", or
1543 "advertisements". You get the idea. But it would not match
1544 "advertizements" (with a "z"). We could fix that by changing our
1545 regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
1546 which would then match either spelling.
1548 /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
1549 forward slashes. Anything in the square brackets "[]" can be matched.
1550 This is using "0-9" as a shorthand expression to mean any digit one
1551 through nine. It is the same as saying "0123456789". So any digit
1552 matches. The "+" means one or more of the preceding expression must be
1553 included. The preceding expression here is what is in the square
1554 brackets -- in this case, any digit one through nine. Then, at the
1555 end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
1556 needs to match the expression on either side of that bar character
1557 also. A simple "gif" on one side, and the other side will in turn
1558 match either "jpeg" or "jpg", since the "?" means the letter "e" is
1559 optional and can be matched once or not at all. So we are building an
1560 expression here to match image GIF or JPEG type image file. It must
1561 include the literal string "advert", then one or more digits, and a
1562 "." (which is now a literal, and not a special character, since it is
1563 escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
1564 possible matches would include: "//advert1.jpg",
1565 "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
1566 would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
1567 (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
1568 not in the expression anywhere).
1570 s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
1571 will replace any occurence of "microsoft". The "i" at the end of the
1572 expression means ignore case. The "(?!.com)" means the match should
1573 fail if "microsoft" is followed by ".com". In other words, this acts
1574 like a "NOT" modifier. In case this is a hyperlink, we don't want to
1577 We are barely scratching the surface of regular expressions here so
1578 that you can understand the default Junkbuster configuration files,
1579 and maybe use this knowledge to customize your own installation. There
1580 is much, much more that can be done with regular expressions. Now that
1581 you know enough to get started, you can learn more on your own :/
1583 More reading on Perl Compatible Regular expressions:
1584 [55]http://www.perldoc.com/perl5.6/pod/perlre.html
1588 1. http://ijbswa.sourceforge.net/user-manual/
1589 2. mailto:ijbswa-developers@lists.sourceforge.net
1590 3. file://localhost/home/swa/sf/current/doc/source/tmp.html#INTRODUCTION
1591 4. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN27
1592 5. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION
1593 6. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SOURCE
1594 7. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-RH
1595 8. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SUSE
1596 9. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OS2
1597 10. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-WIN
1598 11. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OTHER
1599 12. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONFIGURATION
1600 13. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN162
1601 14. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
1602 15. file://localhost/home/swa/sf/current/doc/source/tmp.html#FILTERFILE
1603 16. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1119
1604 17. file://localhost/home/swa/sf/current/doc/source/tmp.html#QUICKSTART
1605 18. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONTACT
1606 19. file://localhost/home/swa/sf/current/doc/source/tmp.html#COPYRIGHT
1607 20. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1185
1608 21. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1191
1609 22. file://localhost/home/swa/sf/current/doc/source/tmp.html#SEEALSO
1610 23. file://localhost/home/swa/sf/current/doc/source/tmp.html#APPENDIX
1611 24. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
1613 26. http://sourceforge.net/projects/ijbswa/
1614 27. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/
1615 28. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=emxrt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
1616 29. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&key=gnupack&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fapps
1617 30. http://www.gnu.org/
1619 32. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
1623 36. http://i.j.b/show-url-info
1625 38. http://www.perldoc.com/perl5.6/pod/perlre.html
1626 39. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
1628 41. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse
1629 42. http://sourceforge.net/mail/?group_id=11118
1630 43. http://sourceforge.net/tracker/?group_id=11118&atid=111118
1631 44. http://www.gnu.org/copyleft/gpl.html
1632 45. http://www.junkbusters.com/ht/en/ijbfaq.html
1633 46. http://www.waldherr.org/junkbuster/
1634 47. http://sourceforge.net/projects/ijbswa/
1635 48. http://sourceforge.net/projects/ijbswa
1636 49. http://ijbswa.sourceforge.net/
1638 51. http://www.junkbusters.com/ht/en/cookies.html
1639 52. http://www.waldherr.org/junkbuster/
1640 53. http://privacy.net/analyze/
1641 54. http://www.squid-cache.org/
1642 55. http://www.perldoc.com/perl5.6/pod/perlre.html