X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Fappendix.html;h=38f4c8b0eb2b4832ac7d7cdd622129785236f396;hp=0b2ca1afcbb1cfd7e978a39fe80b9f3701ca9978;hb=42c361793c45b0d5fc0c116707ca12b2f60f4c52;hpb=2afbba3d1be8e0c53169a05faeab2ac24ccc23b1 diff --git a/doc/webserver/user-manual/appendix.html b/doc/webserver/user-manual/appendix.html index 0b2ca1af..38f4c8b0 100644 --- a/doc/webserver/user-manual/appendix.html +++ b/doc/webserver/user-manual/appendix.html @@ -1,1454 +1,681 @@ - -Appendix - -
Privoxy 3.0.16 User Manual
Prev 

14. Appendix

14.1. Regular Expressions

Privoxy uses Perl-style "regular - expressions" in its actions - files and filter file, - through the PCRE and - PCRS libraries.

If you are reading this, you probably don't understand what "regular - expressions" are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-)

Regular expressions provide a language to describe patterns that can be - run against strings of characters (letter, numbers, etc), to see if they - match the string or not. The patterns are themselves (sometimes complex) - strings of literal characters, combined with wild-cards, and other special - characters, called meta-characters. The "meta-characters" have - special meanings and are used to build complex patterns to be matched against. - Perl Compatible Regular Expressions are an especially convenient - "dialect" of the regular expression language.

To make a simple analogy, we do something similar when we use wild-card - characters when listing files with the dir command in DOS. - *.* matches all filenames. The "special" - character here is the asterisk which matches any and all characters. We can be - more specific and use ? to match just individual - characters. So "dir file?.text" would match - "file1.txt", "file2.txt", etc. We are pattern - matching, using a similar technique to "regular expressions"!

Regular expressions do essentially the same thing, but are much, much more - powerful. There are many more "special characters" and ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples:

. - Matches any single character, e.g. "a", - "A", "4", ":", or "@". -

? - The preceding character or expression is matched ZERO or ONE - times. Either/or. -

+ - The preceding character or expression is matched ONE or MORE - times. -

* - The preceding character or expression is matched ZERO or MORE - times. -

\ - The "escape" character denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g. ".") needs to be taken literally and - not as a special meta-character. Example: "example\.com", makes - sure the period is recognized only as a period (and not expanded to its - meta-character meaning of any single character). -

[ ] - Characters enclosed in brackets will be matched if - any of the enclosed characters are encountered. For instance, "[0-9]" - matches any numeric digit (zero through nine). As an example, we can combine - this with "+" to match any digit one of more times: "[0-9]+". -

( ) - parentheses are used to group a sub-expression, - or multiple sub-expressions. -

| - The "bar" character works like an - "or" conditional statement. A match is successful if the - sub-expression on either side of "|" matches. As an example: - "/(this|that) example/" uses grouping and the bar character - and would match either "this example" or "that - example", and nothing else. -

These are just some of the ones you are likely to use when matching URLs with - Privoxy, and is a long way from a definitive - list. This is enough to get us started with a few simple examples which may - be more illuminating:

/.*/banners/.* - A simple example - that uses the common combination of "." and "*" to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (".*") another literal forward slash, the string - "banners", another forward slash, and lastly another - ".*". We are building - a directory path here. This will match any file with the path that has a - directory named "banners" in it. The ".*" matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: - "/eye/hate/spammers/banners/annoy_me_please.gif", or just - "/banners/annoying.html", or almost an infinite number of other - possible combinations, just so it has "banners" in the path - somewhere.

And now something a little more complex:

/.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again ("/"), so we are - building another expression that is a file path statement. We have another - ".*", so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must - match our pattern is adv, together with - the forward slashes. What comes after the "adv" string is the - interesting part.

Remember the "?" means the preceding expression (either a - literal character or anything grouped with "(...)" in this case) - can exist or not, since this means either zero or one match. So - "((er)?ts?|ertis(ing|ements?))" is optional, as are the - individual sub-expressions: "(er)", - "(ing|ements?)", and the "s". The "|" - means "or". We have two of those. For instance, - "(ing|ements?)", can expand to match either "ing" - OR "ements?". What is being done here, is an - attempt at matching as many variations of "advertisement", and - similar, as possible. So this would expand to match just "adv", - or "advert", or "adverts", or - "advertising", or "advertisement", or - "advertisements". You get the idea. But it would not match - "advertizements" (with a "z"). We could fix that by - changing our regular expression to: - "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which would then match - either spelling.

/.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square brackets - "[ ]" can be matched. This is using "0-9" as a - shorthand expression to mean any digit one through nine. It is the same as - saying "0123456789". So any digit matches. The "+" - means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping: "(gif|jpe?g)". - This includes a "|", so this needs to match the expression on - either side of that bar character also. A simple "gif" on one side, and the other - side will in turn match either "jpeg" or "jpg", - since the "?" means the letter "e" is optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string "advert", then one or more digits, and a "." - (which is now a literal, and not a special character, since it is escaped - with "\"), and lastly either "gif", or - "jpeg", or "jpg". Some possible matches would - include: "//advert1.jpg", - "/nasty/ads/advert1234.gif", - "/banners/from/hell/advert99.jpg". It would not match - "advert1.gif" (no leading slash), or - "/adverts232.jpg" (the expression does not include an - "s"), or "/advert1.jsp" ("jsp" is not - in the expression anywhere).

We are barely scratching the surface of regular expressions here so that you - can understand the default Privoxy - configuration files, and maybe use this knowledge to customize your own - installation. There is much, much more that can be done with regular - expressions. Now that you know enough to get started, you can learn more on - your own :/

More reading on Perl Compatible Regular expressions: - http://perldoc.perl.org/perlre.html

For information on regular expression based substitutions and their applications - in filters, please see the filter file tutorial - in this manual.

14.2. Privoxy's Internal Pages

Since Privoxy proxies each requested - web page, it is easy for Privoxy to - trap certain special URLs. In this way, we can talk directly to - Privoxy, and see how it is - configured, see how our rules are being applied, change these - rules and other configuration options, and even turn - Privoxy's filtering off, all with - a web browser.

The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, - Privoxy must be running to access these. If - not, you will get a friendly error message. Internet access is not - necessary either.

These may be bookmarked for quick reference. See next.

14.2.1. Bookmarklets

Below are some "bookmarklets" to allow you to easily access a - "mini" version of some of Privoxy's - special pages. They are designed for MS Internet Explorer, but should work - equally well in Netscape, Mozilla, and other browsers which support - JavaScript. They are designed to run directly from your bookmarks - not by - clicking the links below (although that should work for testing).

To save them, right-click the link and choose "Add to Favorites" - (IE) or "Add Bookmark" (Netscape). You will get a warning that - the bookmark "may not be safe" - just click OK. Then you can run the - Bookmarklet directly from your favorites/bookmarks. For even faster access, - you can put them on the "Links" bar (IE) or the "Personal - Toolbar" (Netscape), and run them with a single click.

Credit: The site which gave us the general idea for these bookmarklets is - www.bookmarklets.com. They - have more information about bookmarklets.

14.3. Chain of Events

Let's take a quick look at how some of Privoxy's - core features are triggered, and the ensuing sequence of events when a web - page is requested by your browser:

NOTE: This is somewhat of a simplistic overview of what happens with each URL - request. For the sake of brevity and simplicity, we have focused on - Privoxy's core features only.

14.4. Troubleshooting: Anatomy of an Action

The way Privoxy applies - actions and filters - to any given URL can be complex, and not always so - easy to understand what is happening. And sometimes we need to be able to - see just what Privoxy is - doing. Especially, if something Privoxy is doing - is causing us a problem inadvertently. It can be a little daunting to look at - the actions and filters files themselves, since they tend to be filled with - regular expressions whose consequences are not - always so obvious.

One quick test to see if Privoxy is causing a problem - or not, is to disable it temporarily. This should be the first troubleshooting - step. See the Bookmarklets section on a quick - and easy way to do this (be sure to flush caches afterward!). Looking at the - logs is a good idea too. (Note that both the toggle feature and logging are - enabled via config file settings, and may need to be - turned "on".)

Another easy troubleshooting step to try is if you have done any - customization of your installation, revert back to the installed - defaults and see if that helps. There are times the developers get complaints - about one thing or another, and the problem is more related to a customized - configuration issue.

Privoxy also provides the - http://config.privoxy.org/show-url-info - page that can show us very specifically how actions - are being applied to any given URL. This is a big help for troubleshooting.

First, enter one URL (or partial URL) at the prompt, and then - Privoxy will tell us - how the current configuration will handle it. This will not - help with filtering effects (i.e. the "+filter" action) from - one of the filter files since this is handled very - differently and not so easy to trap! It also will not tell you about any other - URLs that may be embedded within the URL you are testing. For instance, images - such as ads are expressed as URLs within the raw page source of HTML pages. So - you will only get info for the actual URL that is pasted into the prompt area - -- not any sub-URLs. If you want to know about embedded URLs like ads, you - will have to dig those out of the HTML source. Use your browser's "View - Page Source" option for this. Or right click on the ad, and grab the - URL.

Let's try an example, google.com, - and look at it one section at a time in a sample configuration (your real - configuration may vary):

+ +
 Matches for http://www.google.com:
+
+
+  
+    
+      Appendix
+    
+    
+    
+    
+    
+    
+    
+  
+  
+    
+    
+

+ 14. Appendix +

+
+

+ 14.1. Regular Expressions +

+

+ Privoxy uses Perl-style "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries. +

+

+ If you are reading this, you probably don't understand what "regular expressions" are, or what they can + do. So this will be a very brief introduction only. A full + explanation would require a book ;-) +

+

+ Regular expressions provide a language to describe patterns that + can be run against strings of characters (letter, numbers, etc), to + see if they match the string or not. The patterns are themselves + (sometimes complex) strings of literal characters, combined with + wild-cards, and other special characters, called meta-characters. + The "meta-characters" have special + meanings and are used to build complex patterns to be matched + against. Perl Compatible Regular Expressions are an especially + convenient "dialect" of the regular + expression language. +

+

+ To make a simple analogy, we do something similar when we use + wild-card characters when listing files with the dir command in DOS. *.* + matches all filenames. The "special" + character here is the asterisk which matches any and all + characters. We can be more specific and use ? to match just individual characters. So "dir file?.text" would match "file1.txt", "file2.txt", + etc. We are pattern matching, using a similar technique to "regular expressions"! +

+

+ Regular expressions do essentially the same thing, but are much, + much more powerful. There are many more "special characters" and ways of building complex + patterns however. Let's look at a few of the common ones, and then + some examples: +

+ + + + + + +
+ . - + Matches any single character, e.g. "a", "A", "4", ":", or + "@". +
- In file: default.action [ View ] [ Edit ] + + + + + + +
+ ? - The + preceding character or expression is matched ZERO or ONE + times. Either/or. +
+ + + + + + + +
+ + - The + preceding character or expression is matched ONE or MORE + times. +
+ + + + + + + +
+ * - The + preceding character or expression is matched ZERO or MORE + times. +
+ + + + + + + +
+ \ - The + "escape" character denotes that + the following character should be taken literally. This is + used where one of the special characters (e.g. ".") needs to be taken literally and not as a + special meta-character. Example: "example\.com", makes sure the period is + recognized only as a period (and not expanded to its + meta-character meaning of any single character). +
+ + + + + + + +
+ [ ] - + Characters enclosed in brackets will be matched if any of the + enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero + through nine). As an example, we can combine this with "+" to match any digit one of more + times: "[0-9]+". +
+ + + + + + + +
+ ( ) - + parentheses are used to group a sub-expression, or multiple + sub-expressions. +
+ + + + + + + +
+ | - The + "bar" character works like an + "or" conditional statement. A + match is successful if the sub-expression on either side of + "|" matches. As an example: "/(this|that) example/" uses grouping + and the bar character and would match either "this example" or "that + example", and nothing else. +
+ +

+ These are just some of the ones you are likely to use when matching + URLs with Privoxy, and is a long + way from a definitive list. This is enough to get us started with a + few simple examples which may be more illuminating: +

+

+ /.*/banners/.* - A simple example that + uses the common combination of "." and + "*" to denote any character, zero or + more times. In other words, any string at all. So we start with a + literal forward slash, then our regular expression pattern (".*") another literal forward slash, the + string "banners", another forward slash, + and lastly another ".*". We are building + a directory path here. This will match any file with the path that + has a directory named "banners" in it. + The ".*" matches any characters, and + this could conceivably be more forward slashes, so it might expand + into a much longer looking path. For example, this could match: + "/eye/hate/spammers/banners/annoy_me_please.gif", or + just "/banners/annoying.html", or almost + an infinite number of other possible combinations, just so it has + "banners" in the path somewhere. +

+

+ And now something a little more complex: +

+

+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ - + We have several literal forward slashes again ("/"), so we are building another expression that is + a file path statement. We have another ".*", so we are matching against any conceivable + sub-path, just so it matches our expression. The only true literal + that must + match our pattern is adv, together with the forward slashes. What + comes after the "adv" string is the + interesting part. +

+

+ Remember the "?" means the preceding + expression (either a literal character or anything grouped with + "(...)" in this case) can exist or not, + since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are + the individual sub-expressions: "(er)", + "(ing|ements?)", and the "s". The "|" means "or". We have two of those. For instance, + "(ing|ements?)", can expand to match + either "ing" OR "ements?". What is being done here, is an attempt at + matching as many variations of "advertisement", and similar, as possible. So this + would expand to match just "adv", or + "advert", or "adverts", or "advertising", or "advertisement", or "advertisements". You get the idea. But it would not + match "advertizements" (with a "z"). We could fix that by changing our + regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which + would then match either spelling. +

+

+ /.*/advert[0-9]+\.(gif|jpe?g) - Again + another path statement with forward slashes. Anything in the square + brackets "[ ]" can be matched. This is + using "0-9" as a shorthand expression to + mean any digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding + expression must be included. The preceding expression here is what + is in the square brackets -- in this case, any digit one through + nine. Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this needs to match the expression on either + side of that bar character also. A simple "gif" on one side, and the other side will in turn + match either "jpeg" or "jpg", since the "?" + means the letter "e" is optional and can + be matched once or not at all. So we are building an expression + here to match image GIF or JPEG type image file. It must include + the literal string "advert", then one or + more digits, and a "." (which is now a + literal, and not a special character, since it is escaped with + "\"), and lastly either "gif", or "jpeg", or + "jpg". Some possible matches would + include: "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not + match "advert1.gif" (no leading slash), + or "/adverts232.jpg" (the expression + does not include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere). +

+

+ We are barely scratching the surface of regular expressions here so + that you can understand the default Privoxy configuration files, and maybe use + this knowledge to customize your own installation. There is much, + much more that can be done with regular expressions. Now that you + know enough to get started, you can learn more on your own :/ +

+

+ More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html +

+

+ For information on regular expression based substitutions and their + applications in filters, please see the filter file tutorial in this manual. +

+
+
+

+ 14.2. Privoxy's Internal Pages +

+

+ Since Privoxy proxies each + requested web page, it is easy for Privoxy to trap certain special URLs. In this + way, we can talk directly to Privoxy, and see how it is configured, see how + our rules are being applied, change these rules and other + configuration options, and even turn Privoxy's filtering off, all with a web + browser. +

+

+ The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, Privoxy must be running to access these. + If not, you will get a friendly error message. Internet access is + not necessary either. +

+

+

+ +
+
+

+ 14.3. Chain of Events +

+

+ Let's take a quick look at how some of Privoxy's core features are triggered, and the + ensuing sequence of events when a web page is requested by your + browser: +

+

+

+
    +
  • +

    + First, your web browser requests a web page. The browser knows + to send the request to Privoxy, which will in turn, relay the + request to the remote web server after passing the following + tests: +

    +
  • +
  • +

    + Privoxy traps any request for + its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the + browser. +

    +
  • +
  • +

    + Next, Privoxy checks to see if + the URL matches any "+block" patterns. If so, the URL is + then blocked, and the remote web server will not be contacted. + "+handle-as-image" and "+handle-as-empty-document" are then + checked, and if there is no match, an HTML "BLOCKED" page is sent back to the browser. + Otherwise, if it does match, an image is returned for the + former, and an empty text document for the latter. The type of + image would depend on the setting of "+set-image-blocker" (blank, checkerboard + pattern, or an HTTP redirect to an image elsewhere). +

    +
  • +
  • +

    + Untrusted URLs are blocked. If URLs are being added to the trust file, then that is done. +

    +
  • +
  • +

    + If the URL pattern matches the "+fast-redirects" action, it is then + processed. Unwanted parts of the requested URL are stripped. +

    +
  • +
  • +

    + Now the rest of the client browser's request headers are + processed. If any of these match any of the relevant actions + (e.g. "+hide-user-agent", etc.), headers are + suppressed or forged as determined by these actions and their + parameters. +

    +
  • +
  • +

    + Now the web server starts sending its response back (i.e. + typically a web page). +

    +
  • +
  • +

    + First, the server headers are read and processed to determine, + among other things, the MIME type (document type) and encoding. + The headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions. +

    +
  • +
  • +

    + If any "+filter" action or "+deanimate-gifs" action applies (and the + document type fits the action), the rest of the page is read + into memory (up to a configurable limit). Then the filter rules + (from default.filter and any other + filter files) are processed against the buffered content. + Filters are applied in the order they are specified in one of + the filter files. Animated GIFs, if present, are reduced to + either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by + Privoxy back to your browser. +

    +

    + If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the + client browser as it becomes available. +

    +
  • +
  • +

    + As the browser receives the now (possibly filtered) page + content, it reads and then requests any URLs that may be + embedded within the page source, e.g. ad images, stylesheets, + JavaScript, other HTML documents (e.g. frames), sounds, etc. + For each of these objects, the browser issues a separate + request (this is easily viewable in Privoxy's logs). And each such request is + in turn processed just as above. Note that a complex web page + will have many, many such embedded URLs. If these secondary + requests are to a different server, then quite possibly a very + differing set of actions is triggered. +

    +
  • +
+ +

+ NOTE: This is somewhat of a simplistic overview of what happens + with each URL request. For the sake of brevity and simplicity, we + have focused on Privoxy's core + features only. +

+
+
+

+ 14.4. Troubleshooting: Anatomy of an + Action +

+

+ The way Privoxy applies actions and filters to any given URL can be + complex, and not always so easy to understand what is happening. + And sometimes we need to be able to see just what Privoxy is doing. Especially, if something + Privoxy is doing is causing us a + problem inadvertently. It can be a little daunting to look at the + actions and filters files themselves, since they tend to be filled + with regular expressions whose + consequences are not always so obvious. +

+

+ One quick test to see if Privoxy + is causing a problem or not, is to disable it temporarily. This + should be the first troubleshooting step (be sure to flush caches + afterward!). Looking at the logs is a good idea too. (Note that + both the toggle feature and logging are enabled via config file settings, and may need to be turned + "on".) +

+

+ Another easy troubleshooting step to try is if you have done any + customization of your installation, revert back to the installed + defaults and see if that helps. There are times the developers get + complaints about one thing or another, and the problem is more + related to a customized configuration issue. +

+

+ Privoxy also provides the http://config.privoxy.org/show-url-info page that can + show us very specifically how actions are being applied to any given URL. + This is a big help for troubleshooting. +

+

+ First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current + configuration will handle it. This will not help with filtering + effects (i.e. the "+filter" action) from one of the filter files + since this is handled very differently and not so easy to trap! It + also will not tell you about any other URLs that may be embedded + within the URL you are testing. For instance, images such as ads + are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the + prompt area -- not any sub-URLs. If you want to know about embedded + URLs like ads, you will have to dig those out of the HTML source. + Use your browser's "View Page Source" + option for this. Or right click on the ad, and grab the URL. +

+

+ Let's try an example, google.com, and look at it one section at a time in a + sample configuration (your real configuration may vary): +

+

+

+ + +
+
+ Matches for http://www.google.com:
+
+ In file: default.action [ View ] [ Edit ]
 
  {+change-x-forwarded-for{block}
  +deanimate-gifs {last}
@@ -1464,180 +691,96 @@ CLASS="GUIBUTTON"
  +session-cookies-only
  +set-image-blocker {pattern}
 /
- 
+
  { -session-cookies-only }
  .google.com
 
  { -fast-redirects }
  .google.com
 
-In file: user.action [ View ] [ Edit ]
-(no matches in this file)  

This is telling us how we have defined our - "actions", and - which ones match for our test case, "google.com". - Displayed is all the actions that are available to us. Remember, - the + sign denotes "on". - - denotes "off". So some are "on" here, but many - are "off". Each example we try may provide a slightly different - end result, depending on our configuration directives.

The first listing - is for our default.action file. The large, multi-line - listing, is how the actions are set to match for all URLs, i.e. our default - settings. If you look at your "actions" file, this would be the - section just below the "aliases" section near the top. This - will apply to all URLs as signified by the single forward slash at the end - of the listing -- " / ".

But we have defined additional actions that would be exceptions to these general - rules, and then we list specific URLs (or patterns) that these exceptions - would apply to. Last match wins. Just below this then are two explicit - matches for ".google.com". The first is negating our previous - cookie setting, which was for "+session-cookies-only" - (i.e. not persistent). So we will allow persistent cookies for google, at - least that is how it is in this example. The second turns - off any "+fast-redirects" - action, allowing this to take place unmolested. Note that there is a leading - dot here -- ".google.com". This will match any hosts and - sub-domains, in the google.com domain also, such as - "www.google.com" or "mail.google.com". But it would not - match "www.google.de"! So, apparently, we have these two actions - defined as exceptions to the general rules at the top somewhere in the lower - part of our default.action file, and - "google.com" is referenced somewhere in these latter sections.

Then, for our user.action file, we again have no hits. - So there is nothing google-specific that we might have added to our own, local - configuration. If there was, those actions would over-rule any actions from - previously processed files, such as default.action. - user.action typically has the last word. This is the - best place to put hard and fast exceptions,

And finally we pull it all together in the bottom section and summarize how - Privoxy is applying all its "actions" - to "google.com":

+ +

 Final results:
- 
+In file: user.action [ View ] [ Edit ]
+(no matches in this file)
+
+
+ +

+ This is telling us how we have defined our "actions", and which ones match for our test + case, "google.com". Displayed is all the + actions that are available to us. Remember, the + sign denotes "on". - denotes "off". So + some are "on" here, but many are "off". Each example we try may provide a + slightly different end result, depending on our configuration + directives. +

+

+ The first listing is for our default.action file. The large, multi-line listing, + is how the actions are set to match for all URLs, i.e. our default + settings. If you look at your "actions" + file, this would be the section just below the "aliases" section near the top. This will apply to + all URLs as signified by the single forward slash at the end of the + listing -- " / ". +

+

+ But we have defined additional actions that would be exceptions to + these general rules, and then we list specific URLs (or patterns) + that these exceptions would apply to. Last match wins. Just below + this then are two explicit matches for ".google.com". The first is negating our previous + cookie setting, which was for "+session-cookies-only" (i.e. not persistent). + So we will allow persistent cookies for google, at least that is + how it is in this example. The second turns off any "+fast-redirects" action, allowing this to take + place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and + sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these + two actions defined as exceptions to the general rules at the top + somewhere in the lower part of our default.action file, and "google.com" is referenced somewhere in these latter + sections. +

+

+ Then, for our user.action file, we again + have no hits. So there is nothing google-specific that we might + have added to our own, local configuration. If there was, those + actions would over-rule any actions from previously processed + files, such as default.action. user.action typically has the last word. This is + the best place to put hard and fast exceptions, +

+

+ And finally we pull it all together in the bottom section and + summarize how Privoxy is applying + all its "actions" to "google.com": +

+

+

+ + +
+
+
 Final results:
+
  -add-header
  -block
- +change-x-forwarded-for{block} 
+ +change-x-forwarded-for{block}
  -client-header-filter{hide-tor-exit-notation}
  -content-type-overwrite
  -crunch-client-header
@@ -1688,142 +831,90 @@ CLASS="SCREEN"
  -prevent-compression
  -redirect
  -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml} 
+ -server-header-filter{html-to-xml}
  -session-cookies-only
- +set-image-blocker {pattern} 

Notice the only difference here to the previous listing, is to - "fast-redirects" and "session-cookies-only", - which are activated specifically for this site in our configuration, - and thus show in the "Final Results".

Now another example, "ad.doubleclick.net":

+ +

 { +block{Domains starts with "ad"} }
+ +set-image-blocker {pattern}
+
+
+ +

+ Notice the only difference here to the previous listing, is to + "fast-redirects" and "session-cookies-only", which are activated + specifically for this site in our configuration, and thus show in + the "Final Results". +

+

+ Now another example, "ad.doubleclick.net": +

+

+

+ + +
+
+
 { +block{Domains starts with "ad"} }
   ad*.
 
  { +block{Domain contains "ad"} }
   .ad.
 
  { +block{Doubleclick banner server} +handle-as-image }
-  .[a-vx-z]*.doubleclick.net

We'll just show the interesting part here - the explicit matches. It is - matched three different times. Two "+block{}" sections, - and a "+block{} +handle-as-image", - which is the expanded form of one of our aliases that had been defined as: - "+block-as-image". ("Aliases" are defined in - the first section of the actions file and typically used to combine more - than one action.)

Any one of these would have done the trick and blocked this as an unwanted - image. This is unnecessarily redundant since the last case effectively - would also cover the first. No point in taking chances with these guys - though ;-) Note that if you want an ad or obnoxious - URL to be invisible, it should be defined as "ad.doubleclick.net" - is done here -- as both a "+block{}" - and an - "+handle-as-image". - The custom alias "+block-as-image" just - simplifies the process and make it more readable.

One last example. Let's try "http://www.example.net/adsl/HOWTO/". - This one is giving us problems. We are getting a blank page. Hmmm ...

+ +

 Matches for http://www.example.net/adsl/HOWTO/:
+  .[a-vx-z]*.doubleclick.net
+
+
+ +

+ We'll just show the interesting part here - the explicit matches. + It is matched three different times. Two "+block{}" sections, and a "+block{} +handle-as-image", which is the expanded + form of one of our aliases that had been defined as: "+block-as-image". ("Aliases" are defined in the first section of + the actions file and typically used to combine more than one + action.) +

+

+ Any one of these would have done the trick and blocked this as an + unwanted image. This is unnecessarily redundant since the last case + effectively would also cover the first. No point in taking chances + with these guys though ;-) Note that if you want an ad or obnoxious + URL to be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}" and an "+handle-as-image". The custom alias "+block-as-image" + just simplifies the process and make it more readable. +

+

+ One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is + giving us problems. We are getting a blank page. Hmmm ... +

+

+

+ + +
+
+
 Matches for http://www.example.net/adsl/HOWTO/:
 
- In file: default.action [ View ] [ Edit ]
+ In file: default.action [ View ] [ Edit ]
 
- {-add-header 
+ {-add-header
   -block
-  +change-x-forwarded-for{block} 
+  +change-x-forwarded-for{block}
   -client-header-filter{hide-tor-exit-notation}
   -content-type-overwrite
   -crunch-client-header
@@ -1831,8 +922,8 @@ CLASS="GUIBUTTON"
   -crunch-incoming-cookies
   -crunch-outgoing-cookies
   -crunch-server-header
-  +deanimate-gifs 
-  -downgrade-http-version 
+  +deanimate-gifs
+  -downgrade-http-version
   +fast-redirects {check-decoded-url}
   -filter {js-events}
   -filter {content-cookies}
@@ -1862,326 +953,212 @@ CLASS="GUIBUTTON"
   -filter {no-ping}
   -force-text-mode
   -handle-as-empty-document
-  -handle-as-image 
+  -handle-as-image
   -hide-accept-language
-  -hide-content-disposition  
-  +hide-from-header{block} 
-  +hide-referer{forge} 
-  -hide-user-agent 
+  -hide-content-disposition
+  +hide-from-header{block}
+  +hide-referer{forge}
+  -hide-user-agent
   -overwrite-last-modified
-  +prevent-compression 
+  +prevent-compression
   -redirect
   -server-header-filter{xml-to-html}
-  -server-header-filter{html-to-xml} 
-  +session-cookies-only 
+  -server-header-filter{html-to-xml}
+  +session-cookies-only
   +set-image-blocker{blank} }
    /
 
  { +block{Path contains "ads".} +handle-as-image }
-  /ads

Ooops, the "/adsl/" is matching "/ads" in our - configuration! But we did not want this at all! Now we see why we get the - blank page. It is actually triggering two different actions here, and - the effects are aggregated so that the URL is blocked, and Privoxy is told - to treat the block as if it were an image. But this is, of course, all wrong. - We could now add a new action below this (or better in our own - user.action file) that explicitly - un blocks ( - "{-block}") paths with - "adsl" in them (remember, last match in the configuration - wins). There are various ways to handle such exceptions. Example:


 { -block }
-  /adsl

Now the page displays ;-) - Remember to flush your browser's caches when making these kinds of changes to - your configuration to insure that you get a freshly delivered page! Or, try - using Shift+Reload.

But now what about a situation where we get no explicit matches like - we did with:


 { +block{Path starts with "ads".} +handle-as-image }
- /ads

That actually was very helpful and pointed us quickly to where the problem - was. If you don't get this kind of match, then it means one of the default - rules in the first section of default.action is causing - the problem. This would require some guesswork, and maybe a little trial and - error to isolate the offending rule. One likely cause would be one of the - "+filter" actions. - These tend to be harder to troubleshoot. - Try adding the URL for the site to one of aliases that turn off - "+filter":

+ +

 { shop }
+  /ads
+
+
+ +

+ Ooops, the "/adsl/" is matching "/ads" in our configuration! But we did not + want this at all! Now we see why we get the blank page. It is + actually triggering two different actions here, and the effects are + aggregated so that the URL is blocked, and Privoxy is told to treat the block as if it + were an image. But this is, of course, all wrong. We could now add + a new action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}") paths with "adsl" in them (remember, last match in the + configuration wins). There are various ways to handle such + exceptions. Example: +

+

+

+ + + + +
+
+
 { -block }
+  /adsl
+
+
+ +

+ Now the page displays ;-) Remember to flush your browser's caches + when making these kinds of changes to your configuration to insure + that you get a freshly delivered page! Or, try using Shift+Reload. +

+

+ But now what about a situation where we get no explicit matches + like we did with: +

+

+

+ + + + +
+
+
 { +block{Path starts with "ads".} +handle-as-image }
+ /ads
+
+
+ +

+ That actually was very helpful and pointed us quickly to where the + problem was. If you don't get this kind of match, then it means one + of the default rules in the first section of default.action is causing the problem. This would + require some guesswork, and maybe a little trial and error to + isolate the offending rule. One likely cause would be one of the "+filter" actions. These tend to be harder to + troubleshoot. Try adding the URL for the site to one of aliases + that turn off "+filter": +

+

+

+ + +
+
+
 { shop }
  .quietpc.com
  .worldpay.com   # for quietpc.com
  .jungle.com
  .scan.co.uk
- .forbes.com

"{ shop }" is an "alias" that expands to - "{ -filter -session-cookies-only }". - Or you could do your own exception to negate filtering:

+ +

 { -filter }
+ .forbes.com
+
+
+ +

+ "{ shop }" is + an "alias" that expands to "{ -filter -session-cookies-only + }". Or you could do your own exception to negate + filtering: +

+

+

+ + +
+
+
 { -filter }
  # Disable ALL filter actions for sites in this section
  .forbes.com
  developer.ibm.com
- localhost

This would turn off all filtering for these sites. This is best - put in user.action, for local site - exceptions. Note that when a simple domain pattern is used by itself (without - the subsequent path portion), all sub-pages within that domain are included - automatically in the scope of the action.

Images that are inexplicably being blocked, may well be hitting the -"+filter{banners-by-size}" - rule, which assumes - that images of certain sizes are ad banners (works well - most of the time since these tend to be standardized).

"{ fragile }" is an alias that disables most - actions that are the most likely to cause trouble. This can be used as a - last resort for problem sites.

+ +

 { fragile }
+ localhost
+
+
+ +

+ This would turn off all filtering for these sites. This is best put + in user.action, for local site + exceptions. Note that when a simple domain pattern is used by + itself (without the subsequent path portion), all sub-pages within + that domain are included automatically in the scope of the action. +

+

+ Images that are inexplicably being blocked, may well be hitting the + "+filter{banners-by-size}" rule, which assumes + that images of certain sizes are ad banners (works well most of the time + since these tend to be standardized). +

+

+ "{ fragile }" + is an alias that disables most actions that are the most likely to + cause trouble. This can be used as a last resort for problem sites. +

+

+

+ + +
+
+
 { fragile }
  # Handle with care: easy to break
  mail.google.
- mybank.example.com

Remember to flush caches! Note that the - mail.google reference lacks the TLD portion (e.g. - ".com"). This will effectively match any TLD with - google in it, such as mail.google.de., - just as an example.

- If this still does not work, you will have to go through the remaining - actions one by one to find which one(s) is causing the problem.


PrevHome 
See Also  
\ No newline at end of file + mybank.example.com +
+
+ +

+ Remember to flush + caches! Note that the mail.google reference lacks the TLD portion (e.g. + ".com"). This will effectively match any + TLD with google in it, such as mail.google.de., just as an example. +

+

+ If this still does not work, you will have to go through the + remaining actions one by one to find which one(s) is causing the + problem. +

+
+
+ + + +