+ Generally, a pattern has the form <domain>/<path>, where both the
+ <domain> and <path> part are optional. If you only specify a domain
+ part, the "/" can be left out:
+
+ www.example.com - is a domain only pattern and will match any request
+ to "www.example.com".
+
+ www.example.com/ - means exactly the same.
+
+ www.example.com/index.html - matches only the single document
+ "/index.html" on "www.example.com".
+
+ /index.html - matches the document "/index.html", regardless of the
+ domain.
+
+ index.html - matches nothing, since it would be interpreted as a
+ domain name and there is no top-level domain called ".html".
+
+ The matching of the domain part offers some flexible options: if the
+ domain starts or ends with a dot, it becomes unanchored at that end.
+ For example:
+
+ .example.com - matches any domain that ENDS in ".example.com".
+
+ www. - matches any domain that STARTS with "www".
+
+ Additionally, there are wild-cards that you can use in the domain
+ names themselves. They work pretty similar to shell wild-cards: "*"
+ stands for zero or more arbitrary characters, "?" stands for any
+ single character. And you can define character classes in square
+ brackets and they can be freely mixed:
+
+ ad*.example.com - matches "adserver.example.com", "ads.example.com",
+ etc but not "sfads.example.com".
+
+ *ad*.example.com - matches all of the above, and then some.
+
+ .?pix.com - matches "www.ipix.com", "pictures.epix.com",
+ "a.b.c.d.e.upix.com", etc.
+
+ www[1-9a-ez].example.com - matches "www1.example.com",
+ "www4.example.com", "wwwd.example.com", "wwwz.example.com", etc., but
+ not "wwww.example.com".
+
+ If Privoxy was compiled with "pcre" support (default), Perl compatible
+ regular expressions can be used. See the pcre/docs/ directory or "man
+ perlre" (also available on
+ [50]http://www.perldoc.com/perl5.6/pod/perlre.html) for details. A
+ brief discussion of regular expressions is in the [51]Appendix. For
+ instance:
+
+ /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any
+ path that includes "advert" followed immediately by one or more
+ digits, then a "." and ending in either "jpeg" or "jpg". So we match
+ "example.com/ads/advert2.jpg", and
+ "www.example.com/ads/banners/advert39.jpeg", but not
+ "www.example.com/ads/banners/advert39.gif" (no gifs in the example
+ pattern).
+
+ Please note that matching in the path is case INSENSITIVE by default,
+ but you can switch to case sensitive at any point in the pattern by
+ using the "(?-i)" switch:
+
+ www.example.com/(?-i)PaTtErN.* - will match only documents whose path
+ starts with "PaTtErN" in exactly this capitalization.
+ _________________________________________________________________
+
+3.4.2. Actions
+
+ Actions are enabled if preceded with a "+", and disabled if preceded
+ with a "-". Actions are invoked by enclosing the action name in curly
+ braces (e.g. {+some_action}), followed by a list of URLs to which the
+ action applies. There are three classes of actions:
+
+ * Boolean (e.g. "+/-block"):
+ {+name} # enable this action
+ {-name} # disable this action
+
+ * parameterized (e.g. "+/-hide-user-agent"):
+ {+name{param}} # enable action and set parameter to "param"
+ {-name} # disable action
+
+ * Multi-value (e.g. "{+/-add-header{Name: value}}",
+ "{+/-wafer{name=value}}"):
+ {+name{param}} # enable action and add parameter "param"
+ {-name{param}} # remove the parameter "param"
+ {-name} # disable this action totally
+
+ If nothing is specified in this file, no "actions" are taken. So in
+ this case Privoxy would just be a normal, non-blocking,
+ non-anonymizing proxy. You must specifically enable the privacy and
+ blocking features you need (although the provided default
+ default.action file will give a good starting point).
+
+ Later defined actions always over-ride earlier ones. For multi-valued
+ actions, the actions are applied in the order they are specified.
+
+ The list of valid Privoxy "actions" are:
+
+ * Add the specified HTTP header, which is not checked for validity.
+ You may specify this many times to specify many different headers:
+ +add-header{Name: value}
+
+ * Block this URL totally. In a default installation, a "blocked" URL
+ will result in bright red banner that says "BLOCKED", with a
+ reason why it is being blocked.
+ +block
+
+ * De-animate all animated GIF images, i.e. reduce them to their last
+ frame. This will also shrink the images considerably (in bytes,
+ not pixels!). If the option "first" is given, the first frame of
+ the animation is used as the replacement. If "last" is given, the
+ last frame of the animation is used instead, which probably makes
+ more sense for most banner animations, but also has the risk of
+ not showing the entire last frame (if it is only a delta to an
+ earlier frame).
+ +deanimate-gifs{last}
+ +deanimate-gifs{first}
+
+ * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0
+ and downgrade the responses as well. Use this action for servers
+ that use HTTP/1.1 protocol features that Privoxy doesn't handle
+ well yet. HTTP/1.1 is only partially implemented. Default is not
+ to downgrade requests.
+ +downgrade
+
+ * Many sites, like yahoo.com, don't just link to other sites.
+ Instead, they will link to some script on their own server, giving
+ the destination as a parameter, which will then redirect you to
+ the final target. URLs resulting from this scheme typically look
+ like: http://some.place/some_script?http://some.where-else.
+ Sometimes, there are even multiple consecutive redirects encoded
+ in the URL. These redirections via scripts make your web browsing
+ more traceable, since the server from which you follow such a link
+ can see where you go to. Apart from that, valuable bandwidth and
+ time is wasted, while your browser ask the server for one redirect
+ after the other. Plus, it feeds the advertisers.
+ The "+fast-redirects" option enables interception of these
+ requests by Privoxy, who will cut off all but the last valid URL
+ in the request and send a local redirect back to your browser
+ without contacting the remote site.
+ +fast-redirects
+
+ * Apply the filters in the section_header section of the
+ default.filter file to the site(s). default.filter sections are
+ grouped according to like functionality.
+ +filter{section_header}
+
+ Filter sections that are pre-defined in the supplied
+ default.filter include:
+
+ html-annoyances: Get rid of particularly annoying HTML abuse.
+
+ js-annoyances: Get rid of particularly annoying JavaScript abuse
+
+ no-poups: Kill all popups in JS and HTML
+
+ frameset-borders: Give frames a border
+
+ webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking)
+
+ no-refresh: Automatic refresh sucks on auto-dialup lines
+
+ fun: Text replacements for subversive browsing fun!
+
+ nimda: Remove (virus) Nimda code.
+
+ banners-by-size: Kill banners by size
+
+ crude-parental: Kill all web pages that contain the words "sex" or
+ "warez"
+
+ * Block any existing X-Forwarded-for header, and do not add a new
+ one:
+ +hide-forwarded
+
+ * If the browser sends a "From:" header containing your e-mail
+ address, this either completely removes the header ("block"), or
+ changes it to the specified e-mail address.
+ +hide-from{block}
+ +hide-from{spam@sittingduck.xqq}
+
+ * Don't send the "Referer:" (sic) header to the web site. You can
+ block it, forge a URL to the same server as the request (which is
+ preferred because some sites will not send images otherwise) or
+ set it to a constant string of your choice.
+ +hide-referer{block}
+ +hide-referer{forge}
+ +hide-referer{http://nowhere.com}
+
+ * Alternative spelling of "+hide-referer". It has the same
+ parameters, and can be freely mixed with, "+hide-referer".
+ ("referrer" is the correct English spelling, however the HTTP
+ specification has a bug - it requires it to be spelled "referer".)
+ +hide-referrer{...}
+
+ * Change the "User-Agent:" header so web servers can't tell your
+ browser type. Warning! This breaks many web sites. Specify the
+ user-agent value you want. Example, pretend to be using Netscape
+ on Linux:
+ +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)}
+
+ * Treat this URL as an image. This only matters if it's also
+ "+block"ed, in which case a "blocked" image can be sent rather
+ than a HTML page. See "+image-blocker{}" below for the control
+ over what is actually sent. If you want invisible ads, they should
+ be defined as images and blocked. And also, "image-blocker" should
+ be set to "blank".
+ +image
+
+ * Decides what to do with URLs that end up tagged with "{+block
+ +image}", e.g an advertizement. There are five options.
+ "-image-blocker" will send a HTML "blocked" page, usually
+ resulting in a "broken image" icon. "+image-blocker{logo}" will
+ send a Privoxy logo image. "+image-blocker{blank}" will send a 1x1
+ transparent GIF image. And finally,
+ "+image-blocker{http://xyz.com}" will send a HTTP temporary
+ redirect to the specified image. This has the advantage of the
+ icon being being cached by the browser, which will speed up the
+ display. "+image-blocker{pattern}" will send a checkboard type
+ pattern, which scales better than the logo (which can get blocky
+ if the browser enlarges it too much).
+ +image-blocker{logo}
+ +image-blocker{blank}
+ +image-blocker{pattern}
+ +image-blocker{http://i.j.b/send-banner}
+
+ * By default (i.e. in the absence of a "+limit-connect" action),
+ Privoxy will only allow CONNECT requests to port 443, which is the
+ standard port for https as a precaution.
+ The CONNECT methods exists in HTTP to allow access to secure
+ websites (https:// URLs) through proxies. It works very simply:
+ the proxy connects to the server on the specified port, and then
+ short-circuits its connections to the client and to the remote
+ proxy. This can be a big security hole, since CONNECT-enabled
+ proxies can be abused as TCP relays very easily.
+ If you want to allow CONNECT for more ports than this, or want to
+ forbid CONNECT altogether, you can specify a comma separated list
+ of ports and port ranges (the latter using dashes, with the
+ minimum defaulting to 0 and max to 65K):
+ +limit-connect{443} # This is the default and need no be
+ specified.
+ +limit-connect{80,443} # Ports 80 and 443 are OK.
+ +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to
+ 100
+ #and above 500 are OK.
+
+ * "+no-compression" prevents the website from compressing the data.
+ Some websites do this, which can be a problem for Privoxy, since
+ "+filter", "+no-popup" and "+gif-deanimate" will not work on
+ compressed data. This will slow down connections to those
+ websites, though. Default is "nocompression" is turned on.
+ +nocompression
+
+ * If the website sets cookies, "no-cookies-keep" will make sure they
+ are erased when you exit and restart your web browser. This makes
+ profiling cookies useless, but won't break sites which require
+ cookies so that you can log in for transactions. Default: on.
+ +no-cookies-keep
+
+ * Prevent the website from reading cookies:
+ +no-cookies-read
+
+ * Prevent the website from setting cookies:
+ +no-cookies-set
+
+ * Filter the website through a built-in filter to disable those
+ obnoxious JavaScript pop-up windows via window.open(), etc. The
+ two alternative spellings are equivalent.
+ +no-popup
+ +no-popups
+
+ * This action only applies if you are using a jarfile for saving
+ cookies. It sends a cookie to every site stating that you do not
+ accept any copyright on cookies sent to you, and asking them not
+ to track you. Of course, this is a (relatively) unique header they
+ could use to track you.
+ +vanilla-wafer
+
+ * This allows you to add an arbitrary cookie. It can be specified
+ multiple times in order to add as many cookies as you like.
+ +wafer{name=value}
+
+ The meaning of any of the above is reversed by preceding the action
+ with a "-", in place of the "+".
+
+ Some examples:
+
+ Turn off cookies by default, then allow a few through for specified
+ sites:
+
+ # Turn off all persistent cookies
+ { +no-cookies-read }
+ { +no-cookies-set }
+ # Allow cookies for this browser session ONLY
+ { +no-cookies-keep }
+ # Exceptions to the above, sites that benefit from persistent cookies
+ { -no-cookies-read }
+ { -no-cookies-set }
+ { -no-cookies-keep }
+ .javasoft.com
+ .sun.com
+ .yahoo.com
+ .msdn.microsoft.com
+ .redhat.com
+ # Alternative way of saying the same thing
+ {-no-cookies-set -no-cookies-read -no-cookies-keep}
+ .sourceforge.net
+ .sf.net
+
+ Now turn off "fast redirects", and then we allow two exceptions:
+
+ # Turn them off!
+ {+fast-redirects}
+
+ # Reverse it for these two sites, which don't work right without it.
+ {-fast-redirects}
+ www.ukc.ac.uk/cgi-bin/wac\.cgi\?
+ login.yahoo.com
+
+ Turn on page filtering according to rules in the defined sections of
+ refilterfile, and make one exception for sourceforge:
+
+ # Run everything through the filter file, using only the
+ # specified sections:
+ +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}\
+ +filter{webbugs} +filter{nimda} +filter{banners-by-size}
+
+ # Then disable filtering of code from sourceforge!
+ {-filter}
+ .cvs.sourceforge.net
+
+ Now some URLs that we want "blocked", ie we won't see them. Many of
+ these use regular expressions that will expand to match multiple URLs:
+
+ # Blocklist:
+ {+block}
+ /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g))
+ /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/])
+ /.*/(ng)?adclient\.cgi
+ /.*/(plain|live|rotate)[-_.]?ads?/
+ /.*/(sponsor)s?[0-9]?/
+ /.*/_?(plain|live)?ads?(-banners)?/
+ /.*/abanners/
+ /.*/ad(sdna_image|gifs?)/
+ /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe)
+ /.*/adbanners/
+ /.*/adserver
+ /.*/adstream\.cgi
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/
+ /.*/banner_?ads/
+ /.*/banners?/
+ /.*/banners?\.cgi/
+ /.*/cgi-bin/centralad/getimage
+ /.*/images/addver\.gif
+ /.*/images/marketing/.*\.(gif|jpe?g)
+ /.*/popupads/
+ /.*/siteads/
+ /.*/sponsor.*\.gif
+ /.*/sponsors?[0-9]?/
+ /.*/advert[0-9]+\.jpg
+ /Media/Images/Adds/
+ /ad_images/
+ /adimages/
+ /.*/ads/
+ /bannerfarm/
+ /grafikk/annonse/
+ /graphics/defaultAd/
+ /image\.ng/AdType
+ /image\.ng/transactionID
+ /images/.*/.*_anim\.gif # alvin brattli
+ /ip_img/.*\.(gif|jpe?g)
+ /rotateads/
+ /rotations/
+ /worldnet/ad\.cgi
+ /cgi-bin/nph-adclick.exe/
+ /.*/Image/BannerAdvertising/
+ /.*/ad-bin/
+ /.*/adlib/server\.cgi
+ /autoads/
+
+ Note that many of these actions have the potential to cause a page to
+ misbehave, possibly even not to display at all. There are many ways a
+ site designer may choose to design his site, and what HTTP header
+ content he may depend on. There is no way to have hard and fast rules
+ for all sites. See the [52]Appendix for a brief example on
+ troubleshooting actions.
+ _________________________________________________________________
+
+3.4.3. Aliases
+
+ Custom "actions", known to Privoxy as "aliases", can be defined by
+ combining other "actions". These can in turn be invoked just like the
+ built-in "actions". Currently, an alias can contain any character
+ except space, tab, "=", "{" or "}". But please use only "a"- "z",
+ "0"-"9", "+", and "-". Alias names are not case sensitive, and must be
+ defined before anything else in the default.actionfile ! And there can
+ only be one set of "aliases" defined.
+
+ Now let's define a few aliases:
+
+ # Useful customer aliases we can use later. These must come first!
+ {{alias}}
+ +no-cookies = +no-cookies-set +no-cookies-read
+ -no-cookies = -no-cookies-set -no-cookies-read
+ fragile = -block -no-cookies -filter -fast-redirects -hide-refere
+ r -no-popups
+ shop = -no-cookies -filter -fast-redirects
+ +imageblock = +block +image
+ #For people who don't like to type too much: ;-)
+ c0 = +no-cookies
+ c1 = -no-cookies
+ c2 = -no-cookies-set +no-cookies-read
+ c3 = +no-cookies-set -no-cookies-read
+ #... etc. Customize to your heart's content.
+
+ Some examples using our "shop" and "fragile" aliases from above:
+
+ # These sites are very complex and require
+ # minimal interference.
+ {fragile}
+ .office.microsoft.com
+ .windowsupdate.microsoft.com
+ .nytimes.com
+ # Shopping sites - still want to block ads.
+ {shop}
+ .quietpc.com
+ .worldpay.com # for quietpc.com
+ .jungle.com
+ .scan.co.uk
+ # These shops require pop-ups
+ {shop -no-popups}
+ .dabs.com
+ .overclockers.co.uk
+ _________________________________________________________________
+
+3.5. The Filter File
+
+ Any web page can be dynamically modified with the filter file. This
+ modification can be removal, or re-writing, of any web page content,
+ including tags and non-visible content. The default filter file is
+ default.filter, located in the config directory.
+
+ The included example file is divided into sections. Each section
+ begins with the FILTER keyword, followed by the identifier for that
+ section, e.g. "FILTER: webbugs". Each section performs a similar type
+ of filtering, such as "html-annoyances".
+
+ This file uses regular expressions to alter or remove any string in
+ the target page. The expressions can only operate on one line at a
+ time. Some examples from the included default default.filter:
+
+ Stop web pages from displaying annoying messages in the status bar by
+ deleting such references:
+
+ FILTER: html-annoyances
+ # New browser windows should be resizeable and have a location and st
+ atus
+ # bar. Make it so.
+ #
+ s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig
+ s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig
+ s/scrolling="?(no|0|Auto)"?/scrolling=1/ig
+ s/menubar="?(no|0)"?/menubar=1/ig
+ # The <BLINK> tag was a crime!
+ #
+ s*<blink>|</blink>**ig
+ # Is this evil?
+ #
+ #s/framespacing="?(no|0)"?//ig
+ #s/margin(height|width)=[0-9]*//gi
+
+ Just for kicks, replace any occurrence of "Microsoft" with
+ "MicroSuck", and have a little fun with topical buzzwords:
+
+ FILTER: fun
+ s/microsoft(?!.com)/MicroSuck/ig
+ # Buzzword Bingo:
+ #
+ s/industry-leading|cutting-edge|award-winning/<font color=red><b>BING
+ O!</b></font>/ig
+
+ Kill those pesky little web-bugs:
+
+ # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking)
+ FILTER: webbugs
+ s/<img\s+[^>]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\
+ s*['"]?1(\D[^>]*?)?>/<!-- Squished WebBug -->/sig
+ _________________________________________________________________
+
+3.6. Templates
+
+ When Privoxy displays one of its internal pages, such as a 404 Not
+ Found error page, it uses the appropriate template. On Linux, BSD, and
+ Unix, these are located in /etc/privoxy/templates by default. These
+ may be customized, if desired.
+ _________________________________________________________________
+
+4. Quickstart to Using Privoxy
+
+ Install package, then run and enjoy! Privoxy is typically started by
+ specifying the main configuration file to be used on the command line.
+ Example Unix startup command:
+
+
+ # /usr/sbin/privoxy /etc/privoxy/config
+
+
+ An init script is provided for SuSE and Redhat.
+
+ For for SuSE: /etc/rc.d/privoxy start
+
+ For RedHat: /etc/rc.d/init.d/privoxy start
+
+ If no configuration file is specified on the command line, Privoxy
+ will look for a file named config in the current directory. Except on
+ Win32 where it will try config.txt. If no file is specified on the
+ command line and no default configuration file can be found, Privoxy
+ will fail to start.
+
+ Be sure your browser is set to use the proxy which is by default at
+ localhost, port 8118. With Netscape (and Mozilla), this can be set
+ under Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy. For
+ Internet Explorer: Tools > Internet Properties -> Connections -> LAN
+ Setting. Then, check "Use Proxy" and fill in the appropriate info
+ (Address: localhost, Port: 8118). Include if HTTPS proxy support too.
+
+ The included default configuration files should give a reasonable
+ starting point, though may be somewhat aggressive in blocking junk.
+ You will probably want to keep an eye out for sites that require
+ persistent cookies, and add these to default.action as needed. By
+ default, most of these will be accepted only during the current
+ browser session, until you add them to the configuration. If you want
+ the browser to handle this instead, you will need to edit
+ default.action and disable this feature. If you use more than one
+ browser, it would make more sense to let Privoxy handle this. In which
+ case, the browser(s) should be set to accept all cookies.
+
+ If a particular site shows problems loading properly, try adding it to
+ the {fragile} section of default.action. This will turn off most
+ actions for this site.
+
+ Privoxy is HTTP/1.1 compliant, but not all 1.1 features are as yet
+ implemented. If browsers that support HTTP/1.1 (like Mozilla or recent
+ versions of I.E.) experience problems, you might try to force HTTP/1.0
+ compatibility. For Mozilla, look under Edit -> Preferences -> Debug ->
+ Networking. Or set the "+downgrade" config option in default.action.
+
+ After running Privoxy for a while, you can start to fine tune the
+ configuration to suit your personal, or site, preferences and
+ requirements. There are many, many aspects that can be customized.
+ "Actions" (as specified in default.action) can be adjusted by pointing
+ your browser to [53]http://i.j.b/, and then follow the link to "edit
+ the actions list". (This is an internal page and does not require
+ Internet access.)
+
+ In fact, various aspects of Privoxy configuration can be viewed from
+ this page, including current configuration parameters, source code
+ version numbers, the browser's request headers, and "actions" that
+ apply to a given URL. In addition to the default.action file editor
+ mentioned above, Privoxy can also be turned "on" and "off" from this
+ page.
+
+ If you encounter problems, please verify it is a Privoxy bug, by
+ disabling Privoxy, and then trying the same page. Also, try another
+ browser if possible to eliminate browser or site problems. Before
+ reporting it as a bug, see if there is not a configuration option that
+ is enabled that is causing the page not to load. You can then add an
+ exception for that page or site. If a bug, please report it to the
+ developers (see below).
+ _________________________________________________________________
+
+4.1. Command Line Options
+
+ Privoxy may be invoked with the following command-line options:
+
+ * --version
+ Print version info and exit, Unix only.
+ * --help
+ Print a short usage info and exit, Unix only.
+ * --no-daemon
+ Don't become a daemon, i.e. don't fork and become process group
+ leader, don't detach from controlling tty. Unix only.
+ * --pidfile FILE
+ On startup, write the process ID to FILE. Delete the FILE on exit.
+ Failiure to create or delete the FILE is non-fatal. If no FILE
+ option is given, no PID file will be used. Unix only.
+ * --user USER[.GROUP]
+ After (optionally) writing the PID file, assume the user ID of
+ USER, and if included the GID of GROUP. Exit if the privileges are
+ not sufficient to do so. Unix only.
+ * configfile
+ If no configfile is included on the command line, Privoxy will
+ look for a file named "config" in the current directory (except on
+ Win32 where it will look for "config.txt" instead). Specify full
+ path to avoid confusion.
+ _________________________________________________________________
+
+5. Contacting the Developers, Bug Reporting and Feature Requests
+
+ We value your feedback. However, to provide you with the best support,
+ please note:
+
+ * Use the [54]Sourceforge support forum to get help.
+ * Submit bugs only thru our [55]Sourceforge bug forum. Make sure
+ that the bug has not already been submitted. Please try to verify
+ that it is a Privoxy bug, and not a browser or site bug first. If
+ you are using your own custom configuration, please try the stock
+ configs to see if the problem is a configuration related bug. And
+ if not using the latest development snapshot, please try the
+ latest one. Or even better, CVS sources.
+ * Submit feature requests only thru our [56]Sourceforge feature
+ request forum.
+
+ For any other issues, feel free to use the [57]mailing lists.
+
+ Anyone interested in actively participating in development and related
+ discussions can join the appropriate mailing list [58]here. Archives
+ are available here too.
+ _________________________________________________________________
+
+6. Copyright and History
+
+6.1. License
+
+ Privoxy is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2 of the License, or (at your
+ option) any later version.
+
+ This program is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details, which is available from
+ [59]the Free Software Foundation, Inc, 59 Temple Place - Suite 330,
+ Boston, MA 02111-1307, USA.
+ _________________________________________________________________
+
+6.2. History
+
+ Junkbuster was originally written by Anonymous Coders and
+ [60]Junkbuster's Corporation, and was released as free open-source
+ software under the GNU GPL. [61]Stefan Waldherr made many
+ improvements, and started the [62]SourceForge project Privoxy to
+ rekindle development. There are now several active developers
+ contributing. The last stable release was v2.0.2, which has now grown
+ whiskers ;-).
+ _________________________________________________________________
+
+7. See also
+
+ [63]http://sourceforge.net/projects/ijbswa
+
+ [64]http://ijbswa.sourceforge.net/
+
+ [65]http://i.j.b/
+
+ [66]http://www.junkbusters.com/ht/en/cookies.html
+
+ [67]http://www.waldherr.org/junkbuster/
+
+ [68]http://privacy.net/analyze/
+
+ [69]http://www.squid-cache.org/
+ _________________________________________________________________
+
+8. Appendix
+
+8.1. Regular Expressions
+
+ Privoxy can use "regular expressions" in various config files.
+ Assuming support for "pcre" (Perl Compatible Regular Expressions) is
+ compiled in, which is the default. Such configuration directives do
+ not require regular expressions, but they can be used to increase
+ flexibility by matching a pattern with wild-cards against URLs.
+
+ If you are reading this, you probably don't understand what "regular
+ expressions" are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+
+ "Regular expressions" is a way of matching one character expression
+ against another to see if it matches or not. One of the "expressions"
+ is a literal string of readable characters (letter, numbers, etc), and
+ the other is a complex string of literal characters combined with
+ wild-cards, and other special characters, called meta-characters. The
+ "meta-characters" have special meanings and are used to build the
+ complex pattern to be matched against. Perl Compatible Regular
+ Expressions is an enhanced form of the regular expression language
+ with backward compatibility.
+
+ To make a simple analogy, we do something similar when we use
+ wild-card characters when listing files with the dir command in DOS.
+ *.* matches all filenames. The "special" character here is the
+ asterisk which matches any and all characters. We can be more specific
+ and use ? to match just individual characters. So "dir file?.text"
+ would match "file1.txt", "file2.txt", etc. We are pattern matching,
+ using a similar technique to "regular expressions"!
+
+ Regular expressions do essentially the same thing, but are much, much
+ more powerful. There are many more "special characters" and ways of
+ building complex patterns however. Let's look at a few of the common
+ ones, and then some examples:
+
+ . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
+
+ ? - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+
+ + - The preceding character or expression is matched ONE or MORE
+ times.
+
+ * - The preceding character or expression is matched ZERO or MORE
+ times.
+
+ \ - The "escape" character denotes that the following character should
+ be taken literally. This is used where one of the special characters
+ (e.g. ".") needs to be taken literally and not as a special
+ meta-character.
+
+ [] - Characters enclosed in brackets will be matched if any of the
+ enclosed characters are encountered.
+
+ () - parentheses are used to group a sub-expression, or multiple
+ sub-expressions.
+
+ | - The "bar" character works like an "or" conditional statement. A
+ match is successful if the sub-expression on either side of "|"
+ matches.
+
+ s/string1/string2/g - This is used to rewrite strings of text.
+ "string1" is replaced by "string2" in this example.
+
+ These are just some of the ones you are likely to use when matching
+ URLs with Privoxy, and is a long way from a definitive list. This is
+ enough to get us started with a few simple examples which may be more
+ illuminating:
+
+ /.*/banners/.* - A simple example that uses the common combination of
+ "." and "*" to denote any character, zero or more times. In other
+ words, any string at all. So we start with a literal forward slash,
+ then our regular expression pattern (".*") another literal forward
+ slash, the string "banners", another forward slash, and lastly another
+ ".*". We are building a directory path here. This will match any file
+ with the path that has a directory named "banners" in it. The ".*"
+ matches any characters, and this could conceivably be more forward
+ slashes, so it might expand into a much longer looking path. For
+ example, this could match:
+ "/eye/hate/spammers/banners/annoy_me_please.gif", or just
+ "/banners/annoying.html", or almost an infinite number of other
+ possible combinations, just so it has "banners" in the path somewhere.
+
+ A now something a little more complex:
+
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
+ forward slashes again ("/"), so we are building another expression
+ that is a file path statement. We have another ".*", so we are
+ matching against any conceivable sub-path, just so it matches our
+ expression. The only true literal that must match our pattern is adv,
+ together with the forward slashes. What comes after the "adv" string
+ is the interesting part.
+
+ Remember the "?" means the preceding expression (either a literal
+ character or anything grouped with "(...)" in this case) can exist or
+ not, since this means either zero or one match. So
+ "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
+ sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
+ "or". We have two of those. For instance, "(ing|ements?)", can expand
+ to match either "ing" OR "ements?". What is being done here, is an
+ attempt at matching as many variations of "advertisement", and
+ similar, as possible. So this would expand to match just "adv", or
+ "advert", or "adverts", or "advertising", or "advertisement", or
+ "advertisements". You get the idea. But it would not match
+ "advertizements" (with a "z"). We could fix that by changing our
+ regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
+ which would then match either spelling.
+
+ /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
+ forward slashes. Anything in the square brackets "[]" can be matched.
+ This is using "0-9" as a shorthand expression to mean any digit one
+ through nine. It is the same as saying "0123456789". So any digit
+ matches. The "+" means one or more of the preceding expression must be
+ included. The preceding expression here is what is in the square
+ brackets -- in this case, any digit one through nine. Then, at the
+ end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
+ needs to match the expression on either side of that bar character
+ also. A simple "gif" on one side, and the other side will in turn
+ match either "jpeg" or "jpg", since the "?" means the letter "e" is
+ optional and can be matched once or not at all. So we are building an
+ expression here to match image GIF or JPEG type image file. It must
+ include the literal string "advert", then one or more digits, and a
+ "." (which is now a literal, and not a special character, since it is
+ escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
+ possible matches would include: "//advert1.jpg",
+ "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
+ would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
+ (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
+ not in the expression anywhere).
+
+ s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
+ will replace any occurrence of "microsoft". The "i" at the end of the
+ expression means ignore case. The "(?!.com)" means the match should
+ fail if "microsoft" is followed by ".com". In other words, this acts
+ like a "NOT" modifier. In case this is a hyperlink, we don't want to
+ break it ;-).
+
+ We are barely scratching the surface of regular expressions here so
+ that you can understand the default Privoxy configuration files, and
+ maybe use this knowledge to customize your own installation. There is
+ much, much more that can be done with regular expressions. Now that
+ you know enough to get started, you can learn more on your own :/
+
+ More reading on Perl Compatible Regular expressions:
+ [70]http://www.perldoc.com/perl5.6/pod/perlre.html
+ _________________________________________________________________
+
+8.2. Privoxy's Internal Pages
+
+ Since Privoxy proxies each requested web page, it is easy for Privoxy
+ to trap certain URLs. In this way, we can talk directly to Privoxy,
+ and see how it is configured, see how our rules are being applied,
+ change these rules and other configuration options, and even turn
+ Privoxy's filtering off, all with a web browser.
+
+ The URLs listed below are the special ones that allow direct access to
+ Privoxy. Of course, Privoxy must be running to access these. If not,
+ you will get a friendly error message. Internet access is not
+ necessary either.
+
+ * Privoxy main page:
+
+ [71]http://ijbswa.sourceforge.net/config/
+ Alternately, this may be reached at [72]http://i.j.b/, but this
+ variation may not work as reliably as the above in some
+ configurations.
+ * Show information about the current configuration:
+
+ [73]http://ijbswa.sourceforge.net/config/show-status
+ * Show the source code version numbers:
+
+ [74]http://ijbswa.sourceforge.net/config/show-version
+ * Show the client's request headers:
+
+ [75]http://ijbswa.sourceforge.net/config/show-request
+ * Show which actions apply to a URL and why:
+
+ [76]http://ijbswa.sourceforge.net/config/show-url-info
+ * Toggle Privoxy on or off:
+
+ [77]http://ijbswa.sourceforge.net/config/toggle
+ Short cuts. Turn off, then on:
+
+ [78]http://ijbswa.sourceforge.net/config/toggle?set=disable
+
+ [79]http://ijbswa.sourceforge.net/config/toggle?set=enable
+ * Edit the actions list file:
+
+ [80]http://ijbswa.sourceforge.net/config/edit-actions
+
+ These may be bookmarked for quick reference.
+ _________________________________________________________________
+
+8.3. Anatomy of an Action
+
+ The way Privoxy applies "actions" to any given URL can be complex, and
+ not always so easy to understand what is happening. And sometimes we
+ need to be able to see just what Privoxy is doing. Especially, if
+ something Privoxy is doing is causing us a problem inadvertantly. It
+ can be a little daunting to look at the actions files themselves,
+ since they tend to be filled with "regular expressions" whose
+ consequences are not always so obvious. Privoxy provides the
+ [81]http://ijbswa.sourceforge.net/config/show-url-info page that can
+ show us very specifically how actions are being applied to any given
+ URL. This is a big help for troubleshooting.
+
+ First, enter one URL (or partial URL) at the prompt, and then Privoxy
+ will tell us how current configuration will handle it. This will not
+ help with filtering effects from the default.filter! It also will not
+ tell you about any other URLs that may be embedded within the URL you
+ are testing. For instance, images such as ads are expressed as URLs
+ within the raw page source of HTML pages. So you will only get info
+ for the actual URL that is pasted into the prompt area -- not any
+ sub-URLs. If you want to know about embedded URLs like ads, you will
+ have to dig those out of the HTML source. Use your browser's "View
+ Page Source" option for this.
+
+ Let's look at an example, [82]google.com, one section at a time:
+
+ System default actions:
+
+ { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter
+ -hide-forwarded -hide-from -hide-referer -hide-user-agent -image
+ -image-blocker -limit-connect -no-compression -no-cookies-keep
+ -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer }
+
+
+ This is the top section, and only tells us of the compiled in
+ defaults. This is basically what Privoxy would do if there were not
+ any "actions" defined, i.e. it does nothing. Every action is disabled.
+ This is not particularly informative for our purposes here. OK, next
+ section:
+
+ Matches for http://google.com:
+
+ { -add-header -block +deanimate-gifs -downgrade +fast-redirects
+ +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
+ +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
+ +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
+ -hide-user-agent -image +image-blocker{blank} +no-compression
+ +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups
+ -vanilla-wafer -wafer }
+ /
+
+ { -no-cookies-keep -no-cookies-read -no-cookies-set }
+ .google.com
+
+ { -fast-redirects }
+ .google.com
+
+
+ This is much more informative, and tells us how we have defined our
+ "actions", and which ones match for our example, "google.com". The
+ first grouping shows our default settings, which would apply to all
+ URLs. If you look at your "actions" file, this would be the section
+ just below the "aliases" section near the top. This applies to all
+ URLs as signified by the single forward slash -- "/".
+
+ These are the default actions we have enabled. But we can define
+ additional actions that would be exceptions to these general rules,
+ and then list specific URLs that these exceptions would apply to. Last
+ match wins. Just below this then are two explict matches for
+ ".google.com". The first is negating our various cookie blocking
+ actions (i.e. we will allow cookies here). The second is allowing
+ "fast-redirects". Note that there is a leading dot here --
+ ".google.com". This will match any hosts and sub-domains, in the
+ google.com domain also, such as "www.google.com". So, apparently, we
+ have these actions defined somewhere in the lower part of our actions
+ file, and "google.com" is referenced in these sections.
+
+ And now we pull it altogether in the bottom section and summarize how
+ Privoxy is appying all its "actions" to "google.com":
+
+ Final results:
+
+ -add-header -block -deanimate-gifs -downgrade -fast-redirects
+ +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
+ +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
+ +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
+ -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression
+ -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer
+ -wafer
+
+
+ Now another example, "ad.doubleclick.net":
+
+ { +block +image }
+ .ad.doubleclick.net
+
+ { +block +image }
+ ad*.
+
+ { +block +image }
+ .doubleclick.net
+
+
+ We'll just show the interesting part here, the explicit matches. It is
+ matched three different times. Each as an "+block +image", which is
+ the expanded form of one of our aliases that had been defined as:
+ "+imageblock". ("Aliases" are defined in the first section of the
+ actions file and typically used to combine more than one action.)
+
+ Any one of these would have done the trick and blocked this as an
+ unwanted image. This is unnecessarily redundant since the last case
+ effectively would also cover the first. No point in taking chances
+ with these guys though ;-) Note that if you want an ad or obnoxious
+ URL to be invisible, it should be defined as "ad.doubleclick.net" is
+ done here -- as both a "+block" and an "+image". The custom alias
+ "+imageblock" does this for us.
+
+ One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/".
+ This one is giving us problems. We are getting a blank page. Hmmm...
+
+ Matches for http://www.rhapsodyk.net/adsl/HOWTO/:
+
+ { -add-header -block +deanimate-gifs -downgrade +fast-redirects
+ +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
+ +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
+ +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
+ -hide-user-agent -image +image-blocker{blank} +no-compression
+ +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups
+ -vanilla-wafer -wafer }
+ /
+
+ { +block +image }
+ /ads
+
+
+ Ooops, the "/adsl/" is matching "/ads"! But we did not want this at
+ all! Now we see why we get the blank page. We could now add a new
+ action below this that explictly does not block (-block) pages with
+ "adsl". There are various ways to handle such exceptions. Example:
+
+ { -block }
+ /adsl
+
+
+ Now the page displays ;-)
+
+References
+
+ Visible links
+ 1. http://ijbswa.sourceforge.net/user-manual/
+ 2. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INTRODUCTION
+ 3. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN28
+ 4. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION
+ 5. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SOURCE
+ 6. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-RH
+ 7. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SUSE
+ 8. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OS2
+ 9. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-WIN
+ 10. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OTHER
+ 11. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONFIGURATION
+ 12. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN147
+ 13. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN165
+ 14. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN196
+ 15. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN229
+ 16. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN322
+ 17. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN459
+ 18. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN547
+ 19. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN656
+ 20. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE
+ 21. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN754
+ 22. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN828
+ 23. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1148
+ 24. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#FILTERFILE
+ 25. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1207
+ 26. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#QUICKSTART
+ 27. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1263
+ 28. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONTACT
+ 29. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#COPYRIGHT
+ 30. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1322
+ 31. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1328
+ 32. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#SEEALSO
+ 33. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#APPENDIX
+ 34. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX
+ 35. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1512
+ 36. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSANAT
+ 37. http://i.j.b/
+ 38. http://sourceforge.net/projects/ijbswa/
+ 39. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/
+ 40. http://www.gnu.org/
+ 41. http://i.j.b/
+ 42. http://ijbswa.sourceforge.net/config/
+ 43. http://i.j.b/
+ 44. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE
+ 45. http://i.j.b/
+ 46. http://i.j.b/
+ 47. http://i.j.b/
+ 48. http://i.j.b/
+ 49. http://i.j.b/show-url-info
+ 50. http://www.perldoc.com/perl5.6/pod/perlre.html
+ 51. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX
+ 52. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSANAT
+ 53. http://i.j.b/
+ 54. http://sourceforge.net/tracker/?group_id=11118&atid=211118
+ 55. http://sourceforge.net/tracker/?group_id=11118&atid=111118
+ 56. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse
+ 57. http://sourceforge.net/mail/?group_id=11118
+ 58. http://sourceforge.net/mail/?group_id=11118
+ 59. http://www.gnu.org/copyleft/gpl.html
+ 60. http://www.junkbusters.com/ht/en/ijbfaq.html
+ 61. http://www.waldherr.org/junkbuster/
+ 62. http://sourceforge.net/projects/ijbswa/
+ 63. http://sourceforge.net/projects/ijbswa
+ 64. http://ijbswa.sourceforge.net/
+ 65. http://i.j.b/
+ 66. http://www.junkbusters.com/ht/en/cookies.html
+ 67. http://www.waldherr.org/junkbuster/
+ 68. http://privacy.net/analyze/
+ 69. http://www.squid-cache.org/
+ 70. http://www.perldoc.com/perl5.6/pod/perlre.html
+ 71. http://ijbswa.sourceforge.net/config/
+ 72. http://i.j.b/
+ 73. http://ijbswa.sourceforge.net/config/show-status
+ 74. http://ijbswa.sourceforge.net/config/show-version
+ 75. http://ijbswa.sourceforge.net/config/show-request
+ 76. http://ijbswa.sourceforge.net/config/show-url-info
+ 77. http://ijbswa.sourceforge.net/config/toggle
+ 78. http://ijbswa.sourceforge.net/config/toggle?set=disable
+ 79. http://ijbswa.sourceforge.net/config/toggle?set=enable
+ 80. http://ijbswa.sourceforge.net/config/edit-actions
+ 81. http://ijbswa.sourceforge.net/config/show-url-info
+ 82. http://google.com/
+
+ Hidden links:
+ 83. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1384
+ 84. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1392
+ 85. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1395
+ 86. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1398
+ 87. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1401
+ 88. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1406
+ 89. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1409
+ 90. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1412
+ 91. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1418