Privoxy User Manual
Prev

9. Appendix

9.1. Regular Expressions

Privoxy can use "regular expressions" - in various config files. Assuming support for "pcre" (Perl - Compatible Regular Expressions) is compiled in, which is the default. Such - configuration directives do not require regular expressions, but they can be - used to increase flexibility by matching a pattern with wild-cards against - URLs.

If you are reading this, you probably don't understand what "regular - expressions" are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-)

"Regular expressions" is a way of matching one character - expression against another to see if it matches or not. One of the - "expressions" is a literal string of readable characters - (letter, numbers, etc), and the other is a complex string of literal - characters combined with wild-cards, and other special characters, called - meta-characters. The "meta-characters" have special meanings and - are used to build the complex pattern to be matched against. Perl Compatible - Regular Expressions is an enhanced form of the regular expression language - with backward compatibility.

To make a simple analogy, we do something similar when we use wild-card - characters when listing files with the dir command in DOS. - *.* matches all filenames. The "special" - character here is the asterisk which matches any and all characters. We can be - more specific and use ? to match just individual - characters. So "dir file?.text" would match - "file1.txt", "file2.txt", etc. We are pattern - matching, using a similar technique to "regular expressions"!

Regular expressions do essentially the same thing, but are much, much more - powerful. There are many more "special characters" and ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples:

. - Matches any single character, e.g. "a", - "A", "4", ":", or "@". -

? - The preceding character or expression is matched ZERO or ONE - times. Either/or. -

+ - The preceding character or expression is matched ONE or MORE - times. -

* - The preceding character or expression is matched ZERO or MORE - times. -

\ - The "escape" character denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g. ".") needs to be taken literally and - not as a special meta-character. -

[] - Characters enclosed in brackets will be matched if - any of the enclosed characters are encountered. -

() - parentheses are used to group a sub-expression, - or multiple sub-expressions. -

| - The "bar" character works like an - "or" conditional statement. A match is successful if the - sub-expression on either side of "|" matches. -

s/string1/string2/g - This is used to rewrite strings of text. - "string1" is replaced by "string2" in this - example. -

These are just some of the ones you are likely to use when matching URLs with - Privoxy, and is a long way from a definitive - list. This is enough to get us started with a few simple examples which may - be more illuminating:

/.*/banners/.* - A simple example - that uses the common combination of "." and "*" to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (".*") another literal forward slash, the string - "banners", another forward slash, and lastly another - ".*". We are building - a directory path here. This will match any file with the path that has a - directory named "banners" in it. The ".*" matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: - "/eye/hate/spammers/banners/annoy_me_please.gif", or just - "/banners/annoying.html", or almost an infinite number of other - possible combinations, just so it has "banners" in the path - somewhere.

A now something a little more complex:

/.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again ("/"), so we are - building another expression that is a file path statement. We have another - ".*", so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must - match our pattern is adv, together with - the forward slashes. What comes after the "adv" string is the - interesting part.

Remember the "?" means the preceding expression (either a - literal character or anything grouped with "(...)" in this case) - can exist or not, since this means either zero or one match. So - "((er)?ts?|ertis(ing|ements?))" is optional, as are the - individual sub-expressions: "(er)", - "(ing|ements?)", and the "s". The "|" - means "or". We have two of those. For instance, - "(ing|ements?)", can expand to match either "ing" - OR "ements?". What is being done here, is an - attempt at matching as many variations of "advertisement", and - similar, as possible. So this would expand to match just "adv", - or "advert", or "adverts", or - "advertising", or "advertisement", or - "advertisements". You get the idea. But it would not match - "advertizements" (with a "z"). We could fix that by - changing our regular expression to: - "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which would then match - either spelling.

/.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square brackets - "[]" can be matched. This is using "0-9" as a - shorthand expression to mean any digit one through nine. It is the same as - saying "0123456789". So any digit matches. The "+" - means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping: "(gif|jpe?g)". - This includes a "|", so this needs to match the expression on - either side of that bar character also. A simple "gif" on one side, and the other - side will in turn match either "jpeg" or "jpg", - since the "?" means the letter "e" is optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string "advert", then one or more digits, and a "." - (which is now a literal, and not a special character, since it is escaped - with "\"), and lastly either "gif", or - "jpeg", or "jpg". Some possible matches would - include: "//advert1.jpg", - "/nasty/ads/advert1234.gif", - "/banners/from/hell/advert99.jpg". It would not match - "advert1.gif" (no leading slash), or - "/adverts232.jpg" (the expression does not include an - "s"), or "/advert1.jsp" ("jsp" is not - in the expression anywhere).

s/microsoft(?!.com)/MicroSuck/i - This is - a substitution. "MicroSuck" will replace any occurrence of - "microsoft". The "i" at the end of the expression - means ignore case. The "(?!.com)" means - the match should fail if "microsoft" is followed by - ".com". In other words, this acts like a "NOT" - modifier. In case this is a hyperlink, we don't want to break it ;-).

We are barely scratching the surface of regular expressions here so that you - can understand the default Privoxy - configuration files, and maybe use this knowledge to customize your own - installation. There is much, much more that can be done with regular - expressions. Now that you know enough to get started, you can learn more on - your own :/

More reading on Perl Compatible Regular expressions: - http://www.perldoc.com/perl5.6/pod/perlre.html

9.2. Privoxy's Internal Pages

Since Privoxy proxies each requested - web page, it is easy for Privoxy to - trap certain special URLs. In this way, we can talk directly to - Privoxy, and see how it is - configured, see how our rules are being applied, change these - rules and other configuration options, and even turn - Privoxy's filtering off, all with - a web browser.

The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, - Privoxy must be running to access these. If - not, you will get a friendly error message. Internet access is not - necessary either.

- Privoxy main page: -
- http://config.privoxy.org/ -
Alternately, this may be reached at http://p.p/, but this - variation may not work as reliably as the above in some configurations. -
- Show information about the current configuration: -
- http://config.privoxy.org/show-status -
- Show the source code version numbers: -
- http://config.privoxy.org/show-version -
- Show the client's request headers: -
- http://config.privoxy.org/show-request -
- Show which actions apply to a URL and why: -
- http://config.privoxy.org/show-url-info -
- Toggle Privoxy on or off. In this case, "Privoxy" continues - to run, but only as a pass-through proxy, with no actions taking place: -
- http://config.privoxy.org/toggle -
Short cuts. Turn off, then on: -
- http://config.privoxy.org/toggle?set=disable -
- http://config.privoxy.org/toggle?set=enable -
- Edit the actions list file: -
- http://config.privoxy.org/edit-actions -

These may be bookmarked for quick reference.

9.2.1. Bookmarklets

Below are some "bookmarklets" to allow you to easily access a - "mini" version of some of Privoxy's - special pages. They are designed for MS Internet Explorer, but should work - equally well in Netscape, Mozilla, and other browsers which support - JavaScript. They are designed to run directly from your bookmarks - not by - clicking the links below (although that should work for testing).

To save them, right-click the link and choose "Add to Favorites" - (IE) or "Add Bookmark" (Netscape). You will get a warning that - the bookmark "may not be safe" - just click OK. Then you can run the - Bookmarklet directly from your favourites/bookmarks. For even faster access, - you can put them on the "Links" bar (IE) or the "Personal - Toolbar" (Netscape), and run them with a single click.

Enable Privoxy -
Disable Privoxy -
Toggle Privoxy (Toggles between enabled and disabled) -
View Privoxy Status -
Actions file feedback system -

Credit: The site which gave me the general idea for these bookmarklets is - www.bookmarklets.com. They - have more information about bookmarklets.

9.3. Anatomy of an Action

The way Privoxy applies "actions" - and "filters" to any given URL can be complex, and not always so - easy to understand what is happening. And sometimes we need to be able to - see just what Privoxy is - doing. Especially, if something Privoxy is doing - is causing us a problem inadvertantly. It can be a little daunting to look at - the actions and filters files themselves, since they tend to be filled with - "regular expressions" whose consequences are not always - so obvious. Privoxy provides the - http://config.privoxy.org/show-url-info - page that can show us very specifically how actions - are being applied to any given URL. This is a big help for troubleshooting. -

First, enter one URL (or partial URL) at the prompt, and then - Privoxy will tell us - how the current configuration will handle it. This will not - help with filtering effects from the default.filter file! It - also will not tell you about any other URLs that may be embedded within the - URL you are testing. For instance, images such as ads are expressed as URLs - within the raw page source of HTML pages. So you will only get info for the - actual URL that is pasted into the prompt area -- not any sub-URLs. If you - want to know about embedded URLs like ads, you will have to dig those out of - the HTML source. Use your browser's "View Page Source" option - for this. Or right click on the ad, and grab the URL.

Let's look at an example, google.com, - one section at a time:

System default actions: - - { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter - -hide-forwarded -hide-from -hide-referer -hide-user-agent -image - -image-blocker -limit-connect -no-compression -no-cookies-keep - -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer } - -

This is the top section, and only tells us of the compiled in defaults. This - is basically what Privoxy would do if there - were not any "actions" defined, i.e. it does nothing. Every action - is disabled. This is not particularly informative for our purposes here. OK, - next section:

Matches for http://google.com: - - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } - / + + + + + Appendix + + + + + + + + + +
+ + + + + + + + + + + + +
Privoxy 3.0.21 User Manual
Prev
+
+
+ +
+
14. Appendix
+ +
+
14.1. Regular + Expressions
+ +
Privoxy uses Perl-style + "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries.
+ +
If you are reading this, you probably don't understand what + "regular expressions" are, or what they can + do. So this will be a very brief introduction only. A full explanation + would require a book ;-)
+ +
Regular expressions provide a language to describe patterns that can + be run against strings of characters (letter, numbers, etc), to see if + they match the string or not. The patterns are themselves (sometimes + complex) strings of literal characters, combined with wild-cards, and + other special characters, called meta-characters. The "meta-characters" have special meanings and are used to + build complex patterns to be matched against. Perl Compatible Regular + Expressions are an especially convenient "dialect" of the regular expression language.
+ +
To make a simple analogy, we do something similar when we use + wild-card characters when listing files with the dir command in DOS. *.* matches + all filenames. The "special" character here + is the asterisk which matches any and all characters. We can be more + specific and use ? to match just individual + characters. So "dir file?.text" would match + "file1.txt", "file2.txt", etc. We are pattern matching, using a + similar technique to "regular + expressions"!
+ +
Regular expressions do essentially the same thing, but are much, + much more powerful. There are many more "special + characters" and ways of building complex patterns however. Let's + look at a few of the common ones, and then some examples:
+ + + + + + + +
. - + Matches any single character, e.g. "a", "A", "4", ":", or + "@".
+ + + + + + + +
? - The + preceding character or expression is matched ZERO or ONE times. + Either/or.
+ + + + + + + +
+ - The + preceding character or expression is matched ONE or MORE + times.
+ + + + + + + +
* - The + preceding character or expression is matched ZERO or MORE + times.
+ + + + + + + +
\ - The + "escape" character denotes that the + following character should be taken literally. This is used where + one of the special characters (e.g. ".") needs to be taken literally and not as a + special meta-character. Example: "example\.com", makes sure the period is + recognized only as a period (and not expanded to its + meta-character meaning of any single character).
+ + + + + + + +
[ ] - + Characters enclosed in brackets will be matched if any of the + enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero through + nine). As an example, we can combine this with "+" to match any digit one of more times: + "[0-9]+".
+ + + + + + + +
( ) - + parentheses are used to group a sub-expression, or multiple + sub-expressions.
+ + + + + + + +
| - The + "bar" character works like an + "or" conditional statement. A match is + successful if the sub-expression on either side of "|" matches. As an example: "/(this|that) example/" uses grouping and the bar + character and would match either "this + example" or "that example", and + nothing else.
+ +
These are just some of the ones you are likely to use when matching + URLs with Privoxy, and is a long way + from a definitive list. This is enough to get us started with a few + simple examples which may be more illuminating:
+ +
/.*/banners/.* - A simple example that uses + the common combination of "." and + "*" to denote any character, zero or more + times. In other words, any string at all. So we start with a literal + forward slash, then our regular expression pattern (".*") another literal forward slash, the string + "banners", another forward slash, and lastly + another ".*". We are building a directory + path here. This will match any file with the path that has a directory + named "banners" in it. The ".*" matches any characters, and this could conceivably + be more forward slashes, so it might expand into a much longer looking + path. For example, this could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or + just "/banners/annoying.html", or almost an + infinite number of other possible combinations, just so it has + "banners" in the path somewhere.
+ +
And now something a little more complex:
+ +
/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We + have several literal forward slashes again ("/"), so we are building another expression that is a + file path statement. We have another ".*", + so we are matching against any conceivable sub-path, just so it matches + our expression. The only true literal that must match our pattern is + adv, together with the forward + slashes. What comes after the "adv" string + is the interesting part.
+ +
Remember the "?" means the preceding + expression (either a literal character or anything grouped with + "(...)" in this case) can exist or not, + since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are the + individual sub-expressions: "(er)", + "(ing|ements?)", and the "s". The "|" means + "or". We have two of those. For instance, + "(ing|ements?)", can expand to match either + "ing" OR "ements?". What is + being done here, is an attempt at matching as many variations of + "advertisement", and similar, as possible. + So this would expand to match just "adv", or + "advert", or "adverts", or "advertising", + or "advertisement", or "advertisements". You get the idea. But it would not + match "advertizements" (with a "z"). We could fix that by changing our regular + expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which + would then match either spelling.
+ +
/.*/advert[0-9]+\.(gif|jpe?g) - Again another + path statement with forward slashes. Anything in the square brackets + "[ ]" can be matched. This is using + "0-9" as a shorthand expression to mean any + digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding expression must + be included. The preceding expression here is what is in the square + brackets -- in this case, any digit one through nine. Then, at the end, + we have a grouping: "(gif|jpe?g)". This + includes a "|", so this needs to match the + expression on either side of that bar character also. A simple + "gif" on one side, and the other side will + in turn match either "jpeg" or "jpg", since the "?" means + the letter "e" is optional and can be + matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string "advert", then one or more digits, + and a "." (which is now a literal, and not a + special character, since it is escaped with "\"), and lastly either "gif", or "jpeg", or + "jpg". Some possible matches would include: + "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match + "advert1.gif" (no leading slash), or + "/adverts232.jpg" (the expression does not + include an "s"), or "/advert1.jsp" ("jsp" is not + in the expression anywhere).
+ +
We are barely scratching the surface of regular expressions here so + that you can understand the default Privoxy configuration files, and maybe use this + knowledge to customize your own installation. There is much, much more + that can be done with regular expressions. Now that you know enough to + get started, you can learn more on your own :/
+ +
More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html
+ +
For information on regular expression based substitutions and their + applications in filters, please see the filter file tutorial in this manual.
+
+ +
+
14.2. Privoxy's + Internal Pages
+ +
Since Privoxy proxies each + requested web page, it is easy for Privoxy to trap certain special URLs. In this way, + we can talk directly to Privoxy, and + see how it is configured, see how our rules are being applied, change + these rules and other configuration options, and even turn Privoxy's filtering off, all with a web + browser.
+ +
The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, Privoxy must be running to access these. If not, + you will get a friendly error message. Internet access is not necessary + either.
+ +
+
+
Privoxy main page:
+ +
+
http://config.privoxy.org/
+
+ +
There is a shortcut: http://p.p/ (But it doesn't provide a fall-back to a + real page, in case the request is not sent through Privoxy)
+
+ +
+
Show information about the current configuration, including + viewing and editing of actions files:
+ +
+
http://config.privoxy.org/show-status
+
+
+ +
+
Show the source code version numbers:
+ +
+
http://config.privoxy.org/show-version
+
+
+ +
+
Show the browser's request headers:
+ +
+
http://config.privoxy.org/show-request
+
+
+ +
+
Show which actions apply to a URL and why:
+ +
+
http://config.privoxy.org/show-url-info
+
+
+ +
+
Toggle Privoxy on or off. This feature can be turned off/on in + the main config file. When toggled + "off", "Privoxy" continues to run, but only as a + pass-through proxy, with no actions taking place:
+ +
+
http://config.privoxy.org/toggle
+
+ +
Short cuts. Turn off, then on:
+ +
+
http://config.privoxy.org/toggle?set=disable
+
+ +
+
http://config.privoxy.org/toggle?set=enable
+
+
+
+ +
These may be bookmarked for quick reference. See next.
+ +
+
14.2.1. + Bookmarklets
+ +
Below are some "bookmarklets" to allow + you to easily access a "mini" version of + some of Privoxy's special pages. + They are designed for MS Internet Explorer, but should work equally + well in Netscape, Mozilla, and other browsers which support + JavaScript. They are designed to run directly from your bookmarks - + not by clicking the links below (although that should work for + testing).
+ +
To save them, right-click the link and choose "Add to Favorites" (IE) or "Add + Bookmark" (Netscape). You will get a warning that the bookmark + "may not be safe" - just click OK. Then + you can run the Bookmarklet directly from your favorites/bookmarks. + For even faster access, you can put them on the "Links" bar (IE) or the "Personal + Toolbar" (Netscape), and run them with a single click.
+ +
+
+
Privoxy - Enable
+
+ +
+
Privoxy - Disable
+
+ +
+
Privoxy - Toggle Privoxy (Toggles between + enabled and disabled)
+
+ +
+
Privoxy- View Status
+
+ +
+
Privoxy - Why?
+
+
+ +
Credit: The site which gave us the general idea for these + bookmarklets is www.bookmarklets.com. They have more information about + bookmarklets.
+
+
+ +
+
14.3. Chain of + Events
+ +
Let's take a quick look at how some of Privoxy's core features are triggered, and the + ensuing sequence of events when a web page is requested by your + browser:
+ +
+
+
First, your web browser requests a web page. The browser knows + to send the request to Privoxy, + which will in turn, relay the request to the remote web server + after passing the following tests:
+
- { -no-cookies-keep -no-cookies-read -no-cookies-set } - .google.com +
+
Privoxy traps any request for + its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the + browser.
+
+ +
+
Next, Privoxy checks to see if + the URL matches any "+block" patterns. If so, the URL is then + blocked, and the remote web server will not be contacted. "+handle-as-image" and "+handle-as-empty-document" are then checked, + and if there is no match, an HTML "BLOCKED" page is sent back to the browser. + Otherwise, if it does match, an image is returned for the former, + and an empty text document for the latter. The type of image would + depend on the setting of "+set-image-blocker" (blank, checkerboard + pattern, or an HTTP redirect to an image elsewhere).
+
+ +
+
Untrusted URLs are blocked. If URLs are being added to the + trust file, then that is done.
+
+ +
+
If the URL pattern matches the "+fast-redirects" action, it is then processed. + Unwanted parts of the requested URL are stripped.
+
+ +
+
Now the rest of the client browser's request headers are + processed. If any of these match any of the relevant actions (e.g. + "+hide-user-agent", etc.), headers are + suppressed or forged as determined by these actions and their + parameters.
+
+ +
+
Now the web server starts sending its response back (i.e. + typically a web page).
+
+ +
+
First, the server headers are read and processed to determine, + among other things, the MIME type (document type) and encoding. The + headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions.
+
+ +
+
If any "+filter" action or "+deanimate-gifs" action applies (and the + document type fits the action), the rest of the page is read into + memory (up to a configurable limit). Then the filter rules (from + default.filter and any other filter + files) are processed against the buffered content. Filters are + applied in the order they are specified in one of the filter files. + Animated GIFs, if present, are reduced to either the first or last + frame, depending on the action setting.The entire page, which is + now filtered, is then sent by Privoxy back to your browser.
+ +
If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the + client browser as it becomes available.
+
+ +
+
As the browser receives the now (possibly filtered) page + content, it reads and then requests any URLs that may be embedded + within the page source, e.g. ad images, stylesheets, JavaScript, + other HTML documents (e.g. frames), sounds, etc. For each of these + objects, the browser issues a separate request (this is easily + viewable in Privoxy's logs). And + each such request is in turn processed just as above. Note that a + complex web page will have many, many such embedded URLs. If these + secondary requests are to a different server, then quite possibly a + very differing set of actions is triggered.
+
+
+ +
NOTE: This is somewhat of a simplistic overview of what happens with + each URL request. For the sake of brevity and simplicity, we have + focused on Privoxy's core features + only.
+
+ +
+
14.4. + Troubleshooting: Anatomy of an Action
+ +
The way Privoxy applies actions and filters to any given URL can be complex, + and not always so easy to understand what is happening. And sometimes + we need to be able to see just what Privoxy is doing. Especially, if something + Privoxy is doing is causing us a + problem inadvertently. It can be a little daunting to look at the + actions and filters files themselves, since they tend to be filled with + regular expressions whose + consequences are not always so obvious.
+ +
One quick test to see if Privoxy is + causing a problem or not, is to disable it temporarily. This should be + the first troubleshooting step. See the Bookmarklets section on a quick + and easy way to do this (be sure to flush caches afterward!). Looking + at the logs is a good idea too. (Note that both the toggle feature and + logging are enabled via config file settings, + and may need to be turned "on".)
+ +
Another easy troubleshooting step to try is if you have done any + customization of your installation, revert back to the installed + defaults and see if that helps. There are times the developers get + complaints about one thing or another, and the problem is more related + to a customized configuration issue.
+ +
Privoxy also provides the http://config.privoxy.org/show-url-info page that can show + us very specifically how actions are + being applied to any given URL. This is a big help for + troubleshooting.
+ +
First, enter one URL (or partial URL) at the prompt, and then + Privoxy will tell us how the current + configuration will handle it. This will not help with filtering effects + (i.e. the "+filter" action) from one of the filter files since + this is handled very differently and not so easy to trap! It also will + not tell you about any other URLs that may be embedded within the URL + you are testing. For instance, images such as ads are expressed as URLs + within the raw page source of HTML pages. So you will only get info for + the actual URL that is pasted into the prompt area -- not any sub-URLs. + If you want to know about embedded URLs like ads, you will have to dig + those out of the HTML source. Use your browser's "View Page Source" option for this. Or right click on + the ad, and grab the URL.
+ +
Let's try an example, google.com, and look at it one section at a time in a sample + configuration (your real configuration may vary):
+ + + +
+
+ Matches for http://www.google.com: + + In file: default.action [ View ] [ Edit ] + + {+change-x-forwarded-for{block} + +deanimate-gifs {last} + +fast-redirects {check-decoded-url} + +filter {refresh-tags} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + +hide-from-header {block} + +hide-referrer {forge} + +session-cookies-only + +set-image-blocker {pattern} +/ + + { -session-cookies-only } + .google.com { -fast-redirects } - .google.com - -
This is much more informative, and tells us how we have defined our - "actions", and which ones match for our example, - "google.com". The first grouping shows our default - settings, which would apply to all URLs. If you look at your "actions" - file, this would be the section just below the "aliases" section - near the top. This applies to all URLs as signified by the single forward - slash -- "/". -
These are the default actions we have enabled. But we can define additional - actions that would be exceptions to these general rules, and then list - specific URLs that these exceptions would apply to. Last match wins. - Just below this then are two explict matches for ".google.com". - The first is negating our various cookie blocking actions (i.e. we will allow - cookies here). The second is allowing "fast-redirects". Note - that there is a leading dot here -- ".google.com". This will - match any hosts and sub-domains, in the google.com domain also, such as - "www.google.com". So, apparently, we have these actions defined - somewhere in the lower part of our actions file, and - "google.com" is referenced in these sections.
And now we pull it altogether in the bottom section and summarize how - Privoxy is appying all its "actions" - to "google.com":

Final results: - - -add-header -block -deanimate-gifs -downgrade -fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression - -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer - -wafer - -
Now another example, "ad.doubleclick.net":
+ +
{ +block +image } - .ad.doubleclick.net - - { +block +image } + .google.com + +In file: user.action [ View ] [ Edit ] +(no matches in this file) +
+
+ +
This is telling us how we have defined our "actions", + and which ones match for our test case, "google.com". Displayed is all the actions that are + available to us. Remember, the + sign denotes + "on". - denotes + "off". So some are "on" here, but many are "off". Each example we try may provide a slightly + different end result, depending on our configuration directives.
+ +
The first listing is for our default.action file. The large, multi-line listing, is + how the actions are set to match for all URLs, i.e. our default + settings. If you look at your "actions" + file, this would be the section just below the "aliases" section near the top. This will apply to all + URLs as signified by the single forward slash at the end of the listing + -- " / ".
+ +
But we have defined additional actions that would be exceptions to + these general rules, and then we list specific URLs (or patterns) that + these exceptions would apply to. Last match wins. Just below this then + are two explicit matches for ".google.com". + The first is negating our previous cookie setting, which was for + "+session-cookies-only" (i.e. not persistent). So we + will allow persistent cookies for google, at least that is how it is in + this example. The second turns off any "+fast-redirects" action, allowing this to take + place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and + sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these two + actions defined as exceptions to the general rules at the top somewhere + in the lower part of our default.action file, + and "google.com" is referenced somewhere in + these latter sections.
+ +
Then, for our user.action file, we again + have no hits. So there is nothing google-specific that we might have + added to our own, local configuration. If there was, those actions + would over-rule any actions from previously processed files, such as + default.action. user.action typically has the last word. This is the + best place to put hard and fast exceptions,
+ +
And finally we pull it all together in the bottom section and + summarize how Privoxy is applying all + its "actions" to "google.com":
+ + + + + +
+
+ Final results: + + -add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs {last} + -downgrade-http-version + -fast-redirects + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header {block} + -hide-if-modified-since + +hide-referrer {forge} + -hide-user-agent + -limit-connect + -overwrite-last-modified + -prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + -session-cookies-only + +set-image-blocker {pattern} +
+
+ +
Notice the only difference here to the previous listing, is to + "fast-redirects" and "session-cookies-only", which are activated specifically + for this site in our configuration, and thus show in the "Final Results".
+ +
Now another example, "ad.doubleclick.net":
+ + + +
+
+ { +block{Domains starts with "ad"} } ad*. - { +block +image } - .doubleclick.net - -
We'll just show the interesting part here, the explicit matches. It is - matched three different times. Each as an "+block +image", - which is the expanded form of one of our aliases that had been defined as: - "+imageblock". ("Aliases" are defined in the - first section of the actions file and typically used to combine more - than one action.)
Any one of these would have done the trick and blocked this as an unwanted - image. This is unnecessarily redundant since the last case effectively - would also cover the first. No point in taking chances with these guys - though ;-) Note that if you want an ad or obnoxious - URL to be invisible, it should be defined as "ad.doubleclick.net" - is done here -- as both a "+block" and an - "+image". The custom alias "+imageblock" does this - for us.
One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/". - This one is giving us problems. We are getting a blank page. Hmmm...
+ +
Matches for http://www.rhapsodyk.net/adsl/HOWTO/: - - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } + { +block{Domain contains "ad"} } + .ad. + + { +block{Doubleclick banner server} +handle-as-image } + .[a-vx-z]*.doubleclick.net +
+
+ +
We'll just show the interesting part here - the explicit matches. It + is matched three different times. Two "+block{}" sections, and a "+block{} + +handle-as-image", which is the expanded form of one of our + aliases that had been defined as: "+block-as-image". ("Aliases" + are defined in the first section of the actions file and typically used + to combine more than one action.)
+ +
Any one of these would have done the trick and blocked this as an + unwanted image. This is unnecessarily redundant since the last case + effectively would also cover the first. No point in taking chances with + these guys though ;-) Note that if you want an ad or obnoxious URL to + be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}" + and an "+handle-as-image". The custom alias "+block-as-image" just + simplifies the process and make it more readable.
+ +
One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is giving + us problems. We are getting a blank page. Hmmm ...
+ + + + + +
+
+ Matches for http://www.example.net/adsl/HOWTO/: + + In file: default.action [ View ] [ Edit ] + + {-add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs + -downgrade-http-version + +fast-redirects {check-decoded-url} + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header{block} + +hide-referer{forge} + -hide-user-agent + -overwrite-last-modified + +prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + +session-cookies-only + +set-image-blocker{blank} } / - { +block +image } + { +block{Path contains "ads".} +handle-as-image } /ads +
+
-

Ooops, the "/adsl/" is matching "/ads"! But - we did not want this at all! Now we see why we get the blank page. We could - now add a new action below this that explictly does not - block (-block) pages with "adsl". There are various ways to - handle such exceptions. Example:

{ -block } - /adsl - -

Now the page displays ;-) Be sure to flush your browser's caches when - making such changes. Or, try using Shift+Reload.

But now what about a situation where we get no explicit matches like - we did with:

+ +
{ -block } +
Ooops, the "/adsl/" is matching + "/ads" in our configuration! But we did not + want this at all! Now we see why we get the blank page. It is actually + triggering two different actions here, and the effects are aggregated + so that the URL is blocked, and Privoxy is told to treat the block as if it were + an image. But this is, of course, all wrong. We could now add a new + action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}") + paths with "adsl" in them (remember, last + match in the configuration wins). There are various ways to handle such + exceptions. Example:
+ + + +
+
+ { -block } /adsl - -
That actually was very telling and pointed us quickly to where the problem - was. If you don't get this kind of match, then it means one of the default - rules in the first section is causing the problem. This would require some - guesswork, and maybe a little trial and error to isolate the offending rule. - One likely cause would be one of the "{+filter}" actions. Try - adding the URL for the site to one of aliases that turn off "+filter":
+ +
{shop} +
+
+ +
Now the page displays ;-) Remember to flush your browser's caches + when making these kinds of changes to your configuration to insure that + you get a freshly delivered page! Or, try using Shift+Reload.
+ +
But now what about a situation where we get no explicit matches like + we did with:
+ + + + + +
+
+ { +block{Path starts with "ads".} +handle-as-image } + /ads +
+
+ +
That actually was very helpful and pointed us quickly to where the + problem was. If you don't get this kind of match, then it means one of + the default rules in the first section of default.action is causing the problem. This would + require some guesswork, and maybe a little trial and error to isolate + the offending rule. One likely cause would be one of the "+filter" + actions. These tend to be harder to troubleshoot. Try adding the URL + for the site to one of aliases that turn off "+filter":
+ + + +
+
+ { shop } .quietpc.com .worldpay.com # for quietpc.com .jungle.com .scan.co.uk .forbes.com - -
"{shop}" is an "alias" that expands to - "{ -filter -no-cookies -no-cookies-keep }". Or you could do - your own exception to negate filtering:
+ +
{-filter} +
+
+ +
"{ shop }" is an + "alias" that expands to "{ -filter -session-cookies-only + }". Or you could do your own exception to negate + filtering:
+ + + +
+
+ { -filter } + # Disable ALL filter actions for sites in this section .forbes.com - -
"{fragile}" is an alias that disables most actions. This can be - used as a last resort for problem sites. Remember to flush caches! If this - still does not work, you will have to go through the remaining actions one by - one to find which one(s) is causing the problem.
Prev Home
See Also
\ No newline at end of file + developer.ibm.com + localhost +
+
+ +

This would turn off all filtering for these sites. This is best put + in user.action, for local site exceptions. + Note that when a simple domain pattern is used by itself (without the + subsequent path portion), all sub-pages within that domain are included + automatically in the scope of the action.

+ +

Images that are inexplicably being blocked, may well be hitting the + "+filter{banners-by-size}" rule, which assumes that + images of certain sizes are ad banners (works well most of the time since these + tend to be standardized).

+ +

"{ fragile }" is + an alias that disables most actions that are the most likely to cause + trouble. This can be used as a last resort for problem sites.

+ + + + + +

+ { fragile }
+ # Handle with care: easy to break
+ mail.google.
+ mybank.example.com
+

+ +

Remember to flush + caches! Note that the mail.google + reference lacks the TLD portion (e.g. ".com"). This will effectively match any TLD with + google in it, such as mail.google.de., just as an example.

+ +

If this still does not work, you will have to go through the + remaining actions one by one to find which one(s) is causing the + problem.