Privoxy 3.0.23 User Manual
Prev

14. Appendix

- -

14.1. Regular - Expressions

- -

Privoxy uses Perl-style - "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries.

- -

If you are reading this, you probably don't understand what - "regular expressions" are, or what they can - do. So this will be a very brief introduction only. A full explanation - would require a book ;-)

- -

Regular expressions provide a language to describe patterns that can - be run against strings of characters (letter, numbers, etc), to see if - they match the string or not. The patterns are themselves (sometimes - complex) strings of literal characters, combined with wild-cards, and - other special characters, called meta-characters. The "meta-characters" have special meanings and are used to - build complex patterns to be matched against. Perl Compatible Regular - Expressions are an especially convenient "dialect" of the regular expression language.

- -

To make a simple analogy, we do something similar when we use - wild-card characters when listing files with the dir command in DOS. *.* matches - all filenames. The "special" character here - is the asterisk which matches any and all characters. We can be more - specific and use ? to match just individual - characters. So "dir file?.text" would match - "file1.txt", "file2.txt", etc. We are pattern matching, using a - similar technique to "regular - expressions"!

- -

Regular expressions do essentially the same thing, but are much, - much more powerful. There are many more "special - characters" and ways of building complex patterns however. Let's - look at a few of the common ones, and then some examples:

- - - - - - - -

. - - Matches any single character, e.g. "a", "A", "4", ":", or - "@".

- - - - - - - -

? - The - preceding character or expression is matched ZERO or ONE times. - Either/or.

- - - - - - - -

+ - The - preceding character or expression is matched ONE or MORE - times.

- - - - - - - -

* - The - preceding character or expression is matched ZERO or MORE - times.

- - - - - - - -

\ - The - "escape" character denotes that the - following character should be taken literally. This is used where - one of the special characters (e.g. ".") needs to be taken literally and not as a - special meta-character. Example: "example\.com", makes sure the period is - recognized only as a period (and not expanded to its - meta-character meaning of any single character).

- - - - - - - -

[ ] - - Characters enclosed in brackets will be matched if any of the - enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero through - nine). As an example, we can combine this with "+" to match any digit one of more times: - "[0-9]+".

- - - - - - - -

( ) - - parentheses are used to group a sub-expression, or multiple - sub-expressions.

- - - - - - - + + + Appendix + + + + + + + + + +

| - The - "bar" character works like an - "or" conditional statement. A match is - successful if the sub-expression on either side of "|" matches. As an example: "/(this|that) example/" uses grouping and the bar - character and would match either "this - example" or "that example", and - nothing else.

+ + + + + + + +

+ Privoxy 3.0.26 User Manual +
+ Prev +	+	+ +

- -

These are just some of the ones you are likely to use when matching - URLs with Privoxy, and is a long way - from a definitive list. This is enough to get us started with a few - simple examples which may be more illuminating:

- -

/.*/banners/.* - A simple example that uses - the common combination of "." and - "*" to denote any character, zero or more - times. In other words, any string at all. So we start with a literal - forward slash, then our regular expression pattern (".*") another literal forward slash, the string - "banners", another forward slash, and lastly - another ".*". We are building a directory - path here. This will match any file with the path that has a directory - named "banners" in it. The ".*" matches any characters, and this could conceivably - be more forward slashes, so it might expand into a much longer looking - path. For example, this could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or - just "/banners/annoying.html", or almost an - infinite number of other possible combinations, just so it has - "banners" in the path somewhere.

- -

And now something a little more complex:

- -

/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We - have several literal forward slashes again ("/"), so we are building another expression that is a - file path statement. We have another ".*", - so we are matching against any conceivable sub-path, just so it matches - our expression. The only true literal that must match our pattern is - adv, together with the forward - slashes. What comes after the "adv" string - is the interesting part.

- -

Remember the "?" means the preceding - expression (either a literal character or anything grouped with - "(...)" in this case) can exist or not, - since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are the - individual sub-expressions: "(er)", - "(ing|ements?)", and the "s". The "|" means - "or". We have two of those. For instance, - "(ing|ements?)", can expand to match either - "ing" OR "ements?". What is - being done here, is an attempt at matching as many variations of - "advertisement", and similar, as possible. - So this would expand to match just "adv", or - "advert", or "adverts", or "advertising", - or "advertisement", or "advertisements". You get the idea. But it would not - match "advertizements" (with a "z"). We could fix that by changing our regular - expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which - would then match either spelling.

- -

/.*/advert[0-9]+\.(gif|jpe?g) - Again another - path statement with forward slashes. Anything in the square brackets - "[ ]" can be matched. This is using - "0-9" as a shorthand expression to mean any - digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding expression must - be included. The preceding expression here is what is in the square - brackets -- in this case, any digit one through nine. Then, at the end, - we have a grouping: "(gif|jpe?g)". This - includes a "|", so this needs to match the - expression on either side of that bar character also. A simple - "gif" on one side, and the other side will - in turn match either "jpeg" or "jpg", since the "?" means - the letter "e" is optional and can be - matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string "advert", then one or more digits, - and a "." (which is now a literal, and not a - special character, since it is escaped with "\"), and lastly either "gif", or "jpeg", or - "jpg". Some possible matches would include: - "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match - "advert1.gif" (no leading slash), or - "/adverts232.jpg" (the expression does not - include an "s"), or "/advert1.jsp" ("jsp" is not - in the expression anywhere).

- -

We are barely scratching the surface of regular expressions here so - that you can understand the default Privoxy configuration files, and maybe use this - knowledge to customize your own installation. There is much, much more - that can be done with regular expressions. Now that you know enough to - get started, you can learn more on your own :/

- -

More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html

- -

For information on regular expression based substitutions and their - applications in filters, please see the filter file tutorial in this manual.

- -

14.2. Privoxy's - Internal Pages

- -

Since Privoxy proxies each - requested web page, it is easy for Privoxy to trap certain special URLs. In this way, - we can talk directly to Privoxy, and - see how it is configured, see how our rules are being applied, change - these rules and other configuration options, and even turn Privoxy's filtering off, all with a web - browser.

- -

The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, Privoxy must be running to access these. If not, - you will get a friendly error message. Internet access is not necessary - either.

- -

-
Privoxy main page:
- -
-
http://config.privoxy.org/
-
- -
There is a shortcut: http://p.p/ (But it doesn't provide a fall-back to a - real page, in case the request is not sent through Privoxy)
-
-
Show information about the current configuration, including - viewing and editing of actions files:
- -
-
http://config.privoxy.org/show-status
-
-
-
Show the source code version numbers:
- -
-
http://config.privoxy.org/show-version
-
-
-
Show the browser's request headers:
- -
-
http://config.privoxy.org/show-request
-
-
-
Show which actions apply to a URL and why:
- -
-
http://config.privoxy.org/show-url-info
-
-
-
Toggle Privoxy on or off. This feature can be turned off/on in - the main config file. When toggled - "off", "Privoxy" continues to run, but only as a - pass-through proxy, with no actions taking place:
- -
-
http://config.privoxy.org/toggle
-
- -
Short cuts. Turn off, then on:
- -
-
http://config.privoxy.org/toggle?set=disable
-
- -
-
http://config.privoxy.org/toggle?set=enable
-
-

- -

14.3. Chain of - Events

- -

Let's take a quick look at how some of Privoxy's core features are triggered, and the - ensuing sequence of events when a web page is requested by your - browser:

- -

-
First, your web browser requests a web page. The browser knows - to send the request to Privoxy, - which will in turn, relay the request to the remote web server - after passing the following tests:
-
-
Privoxy traps any request for - its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the - browser.
-
-
Next, Privoxy checks to see if - the URL matches any "+block" patterns. If so, the URL is then - blocked, and the remote web server will not be contacted. "+handle-as-image" and "+handle-as-empty-document" are then checked, - and if there is no match, an HTML "BLOCKED" page is sent back to the browser. - Otherwise, if it does match, an image is returned for the former, - and an empty text document for the latter. The type of image would - depend on the setting of "+set-image-blocker" (blank, checkerboard - pattern, or an HTTP redirect to an image elsewhere).
-
-
Untrusted URLs are blocked. If URLs are being added to the - trust file, then that is done.
-
-
If the URL pattern matches the "+fast-redirects" action, it is then processed. - Unwanted parts of the requested URL are stripped.
-
-
Now the rest of the client browser's request headers are - processed. If any of these match any of the relevant actions (e.g. - "+hide-user-agent", etc.), headers are - suppressed or forged as determined by these actions and their - parameters.
-
-
Now the web server starts sending its response back (i.e. - typically a web page).
-
-
First, the server headers are read and processed to determine, - among other things, the MIME type (document type) and encoding. The - headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions.
-
-
If any "+filter" action or "+deanimate-gifs" action applies (and the - document type fits the action), the rest of the page is read into - memory (up to a configurable limit). Then the filter rules (from - default.filter and any other filter - files) are processed against the buffered content. Filters are - applied in the order they are specified in one of the filter files. - Animated GIFs, if present, are reduced to either the first or last - frame, depending on the action setting.The entire page, which is - now filtered, is then sent by Privoxy back to your browser.
- -
If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the - client browser as it becomes available.
-
-
As the browser receives the now (possibly filtered) page - content, it reads and then requests any URLs that may be embedded - within the page source, e.g. ad images, stylesheets, JavaScript, - other HTML documents (e.g. frames), sounds, etc. For each of these - objects, the browser issues a separate request (this is easily - viewable in Privoxy's logs). And - each such request is in turn processed just as above. Note that a - complex web page will have many, many such embedded URLs. If these - secondary requests are to a different server, then quite possibly a - very differing set of actions is triggered.
-

- -

NOTE: This is somewhat of a simplistic overview of what happens with - each URL request. For the sake of brevity and simplicity, we have - focused on Privoxy's core features - only.

- -

14.4. - Troubleshooting: Anatomy of an Action

- -

The way Privoxy applies actions and filters to any given URL can be complex, - and not always so easy to understand what is happening. And sometimes - we need to be able to see just what Privoxy is doing. Especially, if something - Privoxy is doing is causing us a - problem inadvertently. It can be a little daunting to look at the - actions and filters files themselves, since they tend to be filled with - regular expressions whose - consequences are not always so obvious.

- -

One quick test to see if Privoxy is - causing a problem or not, is to disable it temporarily. This should be - the first troubleshooting step (be sure to flush caches afterward!). - Looking at the logs is a good idea too. (Note that both the toggle - feature and logging are enabled via config - file settings, and may need to be turned "on".)

- -

Another easy troubleshooting step to try is if you have done any - customization of your installation, revert back to the installed - defaults and see if that helps. There are times the developers get - complaints about one thing or another, and the problem is more related - to a customized configuration issue.

- -

Privoxy also provides the http://config.privoxy.org/show-url-info page that can show - us very specifically how actions are - being applied to any given URL. This is a big help for - troubleshooting.

- -

First, enter one URL (or partial URL) at the prompt, and then - Privoxy will tell us how the current - configuration will handle it. This will not help with filtering effects - (i.e. the "+filter" action) from one of the filter files since - this is handled very differently and not so easy to trap! It also will - not tell you about any other URLs that may be embedded within the URL - you are testing. For instance, images such as ads are expressed as URLs - within the raw page source of HTML pages. So you will only get info for - the actual URL that is pasted into the prompt area -- not any sub-URLs. - If you want to know about embedded URLs like ads, you will have to dig - those out of the HTML source. Use your browser's "View Page Source" option for this. Or right click on - the ad, and grab the URL.

- -

Let's try an example, google.com, and look at it one section at a time in a sample - configuration (your real configuration may vary):

- - - -

+    
+      
+        14. Appendix
+      
+      
+        
+          14.1. Regular Expressions
+        
+        
+          Privoxy uses Perl-style "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries.
+        
+        
+          If you are reading this, you probably don't understand what "regular expressions" are, or what they can
+          do. So this will be a very brief introduction only. A full
+          explanation would require a book ;-)
+        
+        
+          Regular expressions provide a language to describe patterns that
+          can be run against strings of characters (letter, numbers, etc), to
+          see if they match the string or not. The patterns are themselves
+          (sometimes complex) strings of literal characters, combined with
+          wild-cards, and other special characters, called meta-characters.
+          The "meta-characters" have special
+          meanings and are used to build complex patterns to be matched
+          against. Perl Compatible Regular Expressions are an especially
+          convenient "dialect" of the regular
+          expression language.
+        
+        
+          To make a simple analogy, we do something similar when we use
+          wild-card characters when listing files with the dir command in DOS. *.*
+          matches all filenames. The "special"
+          character here is the asterisk which matches any and all
+          characters. We can be more specific and use ? to match just individual characters. So "dir file?.text" would match "file1.txt", "file2.txt",
+          etc. We are pattern matching, using a similar technique to "regular expressions"!
+        
+        
+          Regular expressions do essentially the same thing, but are much,
+          much more powerful. There are many more "special characters" and ways of building complex
+          patterns however. Let's look at a few of the common ones, and then
+          some examples:
+        
+        
+          
+            
+              
+            
+          
+        
+                . -
+                Matches any single character, e.g. "a", "A", "4", ":", or
+                "@".
+              
+
+        
+          
+            
+              
+            
+          
+        
+                ? - The
+                preceding character or expression is matched ZERO or ONE
+                times. Either/or.
+              
+
+        
+          
+            
+              
+            
+          
+        
+                + - The
+                preceding character or expression is matched ONE or MORE
+                times.
+              
+
+        
+          
+            
+              
+            
+          
+        
+                * - The
+                preceding character or expression is matched ZERO or MORE
+                times.
+              
+
+        
+          
+            
+              
+            
+          
+        
+                \ - The
+                "escape" character denotes that
+                the following character should be taken literally. This is
+                used where one of the special characters (e.g. ".") needs to be taken literally and not as a
+                special meta-character. Example: "example\.com", makes sure the period is
+                recognized only as a period (and not expanded to its
+                meta-character meaning of any single character).
+              
+
+        
+          
+            
+              
+            
+          
+        
+                [ ] -
+                Characters enclosed in brackets will be matched if any of the
+                enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero
+                through nine). As an example, we can combine this with "+" to match any digit one of more
+                times: "[0-9]+".
+              
+
+        
+          
+            
+              
+            
+          
+        
+                ( ) -
+                parentheses are used to group a sub-expression, or multiple
+                sub-expressions.
+              
+
+        
+          
+            
+              
+            
+          
+        
+                | - The
+                "bar" character works like an
+                "or" conditional statement. A
+                match is successful if the sub-expression on either side of
+                "|" matches. As an example: "/(this|that) example/" uses grouping
+                and the bar character and would match either "this example" or "that
+                example", and nothing else.
+              
+
+        
+          These are just some of the ones you are likely to use when matching
+          URLs with Privoxy, and is a long
+          way from a definitive list. This is enough to get us started with a
+          few simple examples which may be more illuminating:
+        
+        
+          /.*/banners/.* - A simple example that
+          uses the common combination of "." and
+          "*" to denote any character, zero or
+          more times. In other words, any string at all. So we start with a
+          literal forward slash, then our regular expression pattern (".*") another literal forward slash, the
+          string "banners", another forward slash,
+          and lastly another ".*". We are building
+          a directory path here. This will match any file with the path that
+          has a directory named "banners" in it.
+          The ".*" matches any characters, and
+          this could conceivably be more forward slashes, so it might expand
+          into a much longer looking path. For example, this could match:
+          "/eye/hate/spammers/banners/annoy_me_please.gif", or
+          just "/banners/annoying.html", or almost
+          an infinite number of other possible combinations, just so it has
+          "banners" in the path somewhere.
+        
+        
+          And now something a little more complex:
+        
+        
+          /.*/adv((er)?ts?|ertis(ing|ements?))?/ -
+          We have several literal forward slashes again ("/"), so we are building another expression that is
+          a file path statement. We have another ".*", so we are matching against any conceivable
+          sub-path, just so it matches our expression. The only true literal
+          that must
+          match our pattern is adv, together with the forward slashes. What
+          comes after the "adv" string is the
+          interesting part.
+        
+        
+          Remember the "?" means the preceding
+          expression (either a literal character or anything grouped with
+          "(...)" in this case) can exist or not,
+          since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are
+          the individual sub-expressions: "(er)",
+          "(ing|ements?)", and the "s". The "|" means "or". We have two of those. For instance,
+          "(ing|ements?)", can expand to match
+          either "ing" OR "ements?". What is being done here, is an attempt at
+          matching as many variations of "advertisement", and similar, as possible. So this
+          would expand to match just "adv", or
+          "advert", or "adverts", or "advertising", or "advertisement", or "advertisements". You get the idea. But it would not
+          match "advertizements" (with a "z"). We could fix that by changing our
+          regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which
+          would then match either spelling.
+        
+        
+          /.*/advert[0-9]+\.(gif|jpe?g) - Again
+          another path statement with forward slashes. Anything in the square
+          brackets "[ ]" can be matched. This is
+          using "0-9" as a shorthand expression to
+          mean any digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding
+          expression must be included. The preceding expression here is what
+          is in the square brackets -- in this case, any digit one through
+          nine. Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this needs to match the expression on either
+          side of that bar character also. A simple "gif" on one side, and the other side will in turn
+          match either "jpeg" or "jpg", since the "?"
+          means the letter "e" is optional and can
+          be matched once or not at all. So we are building an expression
+          here to match image GIF or JPEG type image file. It must include
+          the literal string "advert", then one or
+          more digits, and a "." (which is now a
+          literal, and not a special character, since it is escaped with
+          "\"), and lastly either "gif", or "jpeg", or
+          "jpg". Some possible matches would
+          include: "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not
+          match "advert1.gif" (no leading slash),
+          or "/adverts232.jpg" (the expression
+          does not include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere).
+        
+        
+          We are barely scratching the surface of regular expressions here so
+          that you can understand the default Privoxy configuration files, and maybe use
+          this knowledge to customize your own installation. There is much,
+          much more that can be done with regular expressions. Now that you
+          know enough to get started, you can learn more on your own :/
+        
+        
+          More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html
+        
+        
+          For information on regular expression based substitutions and their
+          applications in filters, please see the filter file tutorial in this manual.
+        
+      
+      
+        
+          14.2. Privoxy's Internal Pages
+        
+        
+          Since Privoxy proxies each
+          requested web page, it is easy for Privoxy to trap certain special URLs. In this
+          way, we can talk directly to Privoxy, and see how it is configured, see how
+          our rules are being applied, change these rules and other
+          configuration options, and even turn Privoxy's filtering off, all with a web
+          browser.
+        
+        
+          The URLs listed below are the special ones that allow direct access
+          to Privoxy. Of course, Privoxy must be running to access these.
+          If not, you will get a friendly error message. Internet access is
+          not necessary either.
+        
+        
+        
+        
+          
+            
+              Privoxy main page:
+            
+            
+            
+              
+                http://config.privoxy.org/
+              
+            
+            
+              There is a shortcut: http://p.p/ (But it doesn't provide a fall-back to a
+              real page, in case the request is not sent through Privoxy)
+            
+          
+          
+            
+              Show information about the current configuration, including
+              viewing and editing of actions files:
+            
+            
+            
+              
+                http://config.privoxy.org/show-status
+              
+            
+          
+          
+            
+              Show the source code version numbers:
+            
+            
+            
+              
+                http://config.privoxy.org/show-version
+              
+            
+          
+          
+            
+              Show the browser's request headers:
+            
+            
+            
+              
+                http://config.privoxy.org/show-request
+              
+            
+          
+          
+            
+              Show which actions apply to a URL and why:
+            
+            
+            
+              
+                http://config.privoxy.org/show-url-info
+              
+            
+          
+          
+            
+              Toggle Privoxy on or off. This feature can be turned off/on in
+              the main config file. When toggled
+              "off", "Privoxy" continues to run, but only as a
+              pass-through proxy, with no actions taking place:
+            
+            
+            
+              
+                http://config.privoxy.org/toggle
+              
+            
+            
+              Short cuts. Turn off, then on:
+            
+            
+            
+              
+                http://config.privoxy.org/toggle?set=disable
+              
+            
+            
+            
+              
+                http://config.privoxy.org/toggle?set=enable
+              
+            
+          
+        
+      
+      
+        
+          14.3. Chain of Events
+        
+        
+          Let's take a quick look at how some of Privoxy's core features are triggered, and the
+          ensuing sequence of events when a web page is requested by your
+          browser:
+        
+        
+        
+        
+          
+            
+              First, your web browser requests a web page. The browser knows
+              to send the request to Privoxy, which will in turn, relay the
+              request to the remote web server after passing the following
+              tests:
+            
+          
+          
+            
+              Privoxy traps any request for
+              its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the
+              browser.
+            
+          
+          
+            
+              Next, Privoxy checks to see if
+              the URL matches any "+block" patterns. If so, the URL is
+              then blocked, and the remote web server will not be contacted.
+              "+handle-as-image" and "+handle-as-empty-document" are then
+              checked, and if there is no match, an HTML "BLOCKED" page is sent back to the browser.
+              Otherwise, if it does match, an image is returned for the
+              former, and an empty text document for the latter. The type of
+              image would depend on the setting of "+set-image-blocker" (blank, checkerboard
+              pattern, or an HTTP redirect to an image elsewhere).
+            
+          
+          
+            
+              Untrusted URLs are blocked. If URLs are being added to the trust file, then that is done.
+            
+          
+          
+            
+              If the URL pattern matches the "+fast-redirects" action, it is then
+              processed. Unwanted parts of the requested URL are stripped.
+            
+          
+          
+            
+              Now the rest of the client browser's request headers are
+              processed. If any of these match any of the relevant actions
+              (e.g. "+hide-user-agent", etc.), headers are
+              suppressed or forged as determined by these actions and their
+              parameters.
+            
+          
+          
+            
+              Now the web server starts sending its response back (i.e.
+              typically a web page).
+            
+          
+          
+            
+              First, the server headers are read and processed to determine,
+              among other things, the MIME type (document type) and encoding.
+              The headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions.
+            
+          
+          
+            
+              If any "+filter" action or "+deanimate-gifs" action applies (and the
+              document type fits the action), the rest of the page is read
+              into memory (up to a configurable limit). Then the filter rules
+              (from default.filter and any other
+              filter files) are processed against the buffered content.
+              Filters are applied in the order they are specified in one of
+              the filter files. Animated GIFs, if present, are reduced to
+              either the first or last frame, depending on the action
+              setting.The entire page, which is now filtered, is then sent by
+              Privoxy back to your browser.
+            
+            
+              If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the
+              client browser as it becomes available.
+            
+          
+          
+            
+              As the browser receives the now (possibly filtered) page
+              content, it reads and then requests any URLs that may be
+              embedded within the page source, e.g. ad images, stylesheets,
+              JavaScript, other HTML documents (e.g. frames), sounds, etc.
+              For each of these objects, the browser issues a separate
+              request (this is easily viewable in Privoxy's logs). And each such request is
+              in turn processed just as above. Note that a complex web page
+              will have many, many such embedded URLs. If these secondary
+              requests are to a different server, then quite possibly a very
+              differing set of actions is triggered.
+            
+          
+        
+
+        
+          NOTE: This is somewhat of a simplistic overview of what happens
+          with each URL request. For the sake of brevity and simplicity, we
+          have focused on Privoxy's core
+          features only.
+        
+      
+      
+        
+          14.4. Troubleshooting: Anatomy of an
+          Action
+        
+        
+          The way Privoxy applies actions and filters to any given URL can be
+          complex, and not always so easy to understand what is happening.
+          And sometimes we need to be able to see just what Privoxy is doing. Especially, if something
+          Privoxy is doing is causing us a
+          problem inadvertently. It can be a little daunting to look at the
+          actions and filters files themselves, since they tend to be filled
+          with regular expressions whose
+          consequences are not always so obvious.
+        
+        
+          One quick test to see if Privoxy
+          is causing a problem or not, is to disable it temporarily. This
+          should be the first troubleshooting step (be sure to flush caches
+          afterward!). Looking at the logs is a good idea too. (Note that
+          both the toggle feature and logging are enabled via config file settings, and may need to be turned
+          "on".)
+        
+        
+          Another easy troubleshooting step to try is if you have done any
+          customization of your installation, revert back to the installed
+          defaults and see if that helps. There are times the developers get
+          complaints about one thing or another, and the problem is more
+          related to a customized configuration issue.
+        
+        
+          Privoxy also provides the http://config.privoxy.org/show-url-info page that can
+          show us very specifically how actions are being applied to any given URL.
+          This is a big help for troubleshooting.
+        
+        
+          First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current
+          configuration will handle it. This will not help with filtering
+          effects (i.e. the "+filter" action) from one of the filter files
+          since this is handled very differently and not so easy to trap! It
+          also will not tell you about any other URLs that may be embedded
+          within the URL you are testing. For instance, images such as ads
+          are expressed as URLs within the raw page source of HTML pages. So
+          you will only get info for the actual URL that is pasted into the
+          prompt area -- not any sub-URLs. If you want to know about embedded
+          URLs like ads, you will have to dig those out of the HTML source.
+          Use your browser's "View Page Source"
+          option for this. Or right click on the ad, and grab the URL.
+        
+        
+          Let's try an example, google.com, and look at it one section at a time in a
+          sample configuration (your real configuration may vary):
+        
+        
+        
+        
+          
+            
-        
-      
+  Matches for http://www.google.com:
 
  In file: default.action [ View ] [ View ] [ Edit ]
 (no matches in this file)
 
-          
-
-      This is telling us how we have defined our "actions",
-      and which ones match for our test case, "google.com". Displayed is all the actions that are
-      available to us. Remember, the + sign denotes
-      "on". - denotes
-      "off". So some are "on" here, but many are "off". Each example we try may provide a slightly
-      different end result, depending on our configuration directives.
-
-      The first listing is for our default.action file. The large, multi-line listing, is
-      how the actions are set to match for all URLs, i.e. our default
-      settings. If you look at your "actions"
-      file, this would be the section just below the "aliases" section near the top. This will apply to all
-      URLs as signified by the single forward slash at the end of the listing
-      -- " / ".
-
-      But we have defined additional actions that would be exceptions to
-      these general rules, and then we list specific URLs (or patterns) that
-      these exceptions would apply to. Last match wins. Just below this then
-      are two explicit matches for ".google.com".
-      The first is negating our previous cookie setting, which was for
-      "+session-cookies-only" (i.e. not persistent). So we
-      will allow persistent cookies for google, at least that is how it is in
-      this example. The second turns off any "+fast-redirects" action, allowing this to take
-      place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and
-      sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these two
-      actions defined as exceptions to the general rules at the top somewhere
-      in the lower part of our default.action file,
-      and "google.com" is referenced somewhere in
-      these latter sections.
-
-      Then, for our user.action file, we again
-      have no hits. So there is nothing google-specific that we might have
-      added to our own, local configuration. If there was, those actions
-      would over-rule any actions from previously processed files, such as
-      default.action. user.action typically has the last word. This is the
-      best place to put hard and fast exceptions,
-
-      And finally we pull it all together in the bottom section and
-      summarize how Privoxy is applying all
-      its "actions" to "google.com":
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          This is telling us how we have defined our "actions", and which ones match for our test
+          case, "google.com". Displayed is all the
+          actions that are available to us. Remember, the + sign denotes "on". - denotes "off". So
+          some are "on" here, but many are "off". Each example we try may provide a
+          slightly different end result, depending on our configuration
+          directives.
+        
+        
+          The first listing is for our default.action file. The large, multi-line listing,
+          is how the actions are set to match for all URLs, i.e. our default
+          settings. If you look at your "actions"
+          file, this would be the section just below the "aliases" section near the top. This will apply to
+          all URLs as signified by the single forward slash at the end of the
+          listing -- " / ".
+        
+        
+          But we have defined additional actions that would be exceptions to
+          these general rules, and then we list specific URLs (or patterns)
+          that these exceptions would apply to. Last match wins. Just below
+          this then are two explicit matches for ".google.com". The first is negating our previous
+          cookie setting, which was for "+session-cookies-only" (i.e. not persistent).
+          So we will allow persistent cookies for google, at least that is
+          how it is in this example. The second turns off any "+fast-redirects" action, allowing this to take
+          place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and
+          sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these
+          two actions defined as exceptions to the general rules at the top
+          somewhere in the lower part of our default.action file, and "google.com" is referenced somewhere in these latter
+          sections.
+        
+        
+          Then, for our user.action file, we again
+          have no hits. So there is nothing google-specific that we might
+          have added to our own, local configuration. If there was, those
+          actions would over-rule any actions from previously processed
+          files, such as default.action. user.action typically has the last word. This is
+          the best place to put hard and fast exceptions,
+        
+        
+          And finally we pull it all together in the bottom section and
+          summarize how Privoxy is applying
+          all its "actions" to "google.com":
+        
+        
+        
+        
+          
+            
-        
-      
+  Final results:
 
  -add-header
@@ -734,23 +835,27 @@ In file: user.action [ View ] 
-          
-
-      Notice the only difference here to the previous listing, is to
-      "fast-redirects" and "session-cookies-only", which are activated specifically
-      for this site in our configuration, and thus show in the "Final Results".
-
-      Now another example, "ad.doubleclick.net":
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Notice the only difference here to the previous listing, is to
+          "fast-redirects" and "session-cookies-only", which are activated
+          specifically for this site in our configuration, and thus show in
+          the "Final Results".
+        
+        
+          Now another example, "ad.doubleclick.net":
+        
+        
+        
+        
+          
+            
-        
-      
+  { +block{Domains starts with "ad"} }
   ad*.
 
@@ -760,41 +865,48 @@ In file: user.action [ View ] 
-          
-
-      We'll just show the interesting part here - the explicit matches. It
-      is matched three different times. Two "+block{}" sections, and a "+block{}
-      +handle-as-image", which is the expanded form of one of our
-      aliases that had been defined as: "+block-as-image". ("Aliases"
-      are defined in the first section of the actions file and typically used
-      to combine more than one action.)
-
-      Any one of these would have done the trick and blocked this as an
-      unwanted image. This is unnecessarily redundant since the last case
-      effectively would also cover the first. No point in taking chances with
-      these guys though ;-) Note that if you want an ad or obnoxious URL to
-      be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}"
-      and an "+handle-as-image". The custom alias "+block-as-image" just
-      simplifies the process and make it more readable.
-
-      One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is giving
-      us problems. We are getting a blank page. Hmmm ...
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          We'll just show the interesting part here - the explicit matches.
+          It is matched three different times. Two "+block{}" sections, and a "+block{} +handle-as-image", which is the expanded
+          form of one of our aliases that had been defined as: "+block-as-image". ("Aliases" are defined in the first section of
+          the actions file and typically used to combine more than one
+          action.)
+        
+        
+          Any one of these would have done the trick and blocked this as an
+          unwanted image. This is unnecessarily redundant since the last case
+          effectively would also cover the first. No point in taking chances
+          with these guys though ;-) Note that if you want an ad or obnoxious
+          URL to be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}" and an "+handle-as-image". The custom alias "+block-as-image"
+          just simplifies the process and make it more readable.
+        
+        
+          One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is
+          giving us problems. We are getting a blank page. Hmmm ...
+        
+        
+        
+        
+          
+            
-        
-      
+  Matches for http://www.example.net/adsl/HOWTO/:
 
  In file: default.action [ View ] [ View ] 
-          
-
-      Ooops, the "/adsl/" is matching
-      "/ads" in our configuration! But we did not
-      want this at all! Now we see why we get the blank page. It is actually
-      triggering two different actions here, and the effects are aggregated
-      so that the URL is blocked, and Privoxy is told to treat the block as if it were
-      an image. But this is, of course, all wrong. We could now add a new
-      action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}")
-      paths with "adsl" in them (remember, last
-      match in the configuration wins). There are various ways to handle such
-      exceptions. Example:
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Ooops, the "/adsl/" is matching "/ads" in our configuration! But we did not
+          want this at all! Now we see why we get the blank page. It is
+          actually triggering two different actions here, and the effects are
+          aggregated so that the URL is blocked, and Privoxy is told to treat the block as if it
+          were an image. But this is, of course, all wrong. We could now add
+          a new action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}") paths with "adsl" in them (remember, last match in the
+          configuration wins). There are various ways to handle such
+          exceptions. Example:
+        
+        
+        
+        
+          
+            
-        
-      
+  { -block }
   /adsl
 
-          
-
-      Now the page displays ;-) Remember to flush your browser's caches
-      when making these kinds of changes to your configuration to insure that
-      you get a freshly delivered page! Or, try using Shift+Reload.
-
-      But now what about a situation where we get no explicit matches like
-      we did with:
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Now the page displays ;-) Remember to flush your browser's caches
+          when making these kinds of changes to your configuration to insure
+          that you get a freshly delivered page! Or, try using Shift+Reload.
+        
+        
+          But now what about a situation where we get no explicit matches
+          like we did with:
+        
+        
+        
+        
+          
+            
-        
-      
+  { +block{Path starts with "ads".} +handle-as-image }
  /ads
 
-          
-
-      That actually was very helpful and pointed us quickly to where the
-      problem was. If you don't get this kind of match, then it means one of
-      the default rules in the first section of default.action is causing the problem. This would
-      require some guesswork, and maybe a little trial and error to isolate
-      the offending rule. One likely cause would be one of the "+filter"
-      actions. These tend to be harder to troubleshoot. Try adding the URL
-      for the site to one of aliases that turn off "+filter":
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          That actually was very helpful and pointed us quickly to where the
+          problem was. If you don't get this kind of match, then it means one
+          of the default rules in the first section of default.action is causing the problem. This would
+          require some guesswork, and maybe a little trial and error to
+          isolate the offending rule. One likely cause would be one of the "+filter" actions. These tend to be harder to
+          troubleshoot. Try adding the URL for the site to one of aliases
+          that turn off "+filter":
+        
+        
+        
+        
+          
+            
-        
-      
+  { shop }
  .quietpc.com
  .worldpay.com   # for quietpc.com
@@ -931,96 +1054,111 @@ In file: user.action [ View ] 
-          
-
-      "{ shop }" is an
-      "alias" that expands to "{ -filter -session-cookies-only
-      }". Or you could do your own exception to negate
-      filtering:
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          "{ shop }" is
+          an "alias" that expands to "{ -filter -session-cookies-only
+          }". Or you could do your own exception to negate
+          filtering:
+        
+        
+        
+        
+          
+            
-        
-      
+  { -filter }
  # Disable ALL filter actions for sites in this section
  .forbes.com
  developer.ibm.com
  localhost
 
-          
-
-      This would turn off all filtering for these sites. This is best put
-      in user.action, for local site exceptions.
-      Note that when a simple domain pattern is used by itself (without the
-      subsequent path portion), all sub-pages within that domain are included
-      automatically in the scope of the action.
-
-      Images that are inexplicably being blocked, may well be hitting the
-      "+filter{banners-by-size}" rule, which assumes that
-      images of certain sizes are ad banners (works well most of the time since these
-      tend to be standardized).
-
-      "{ fragile }" is
-      an alias that disables most actions that are the most likely to cause
-      trouble. This can be used as a last resort for problem sites.
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          This would turn off all filtering for these sites. This is best put
+          in user.action, for local site
+          exceptions. Note that when a simple domain pattern is used by
+          itself (without the subsequent path portion), all sub-pages within
+          that domain are included automatically in the scope of the action.
+        
+        
+          Images that are inexplicably being blocked, may well be hitting the
+          "+filter{banners-by-size}" rule, which assumes
+          that images of certain sizes are ad banners (works well most of the time
+          since these tend to be standardized).
+        
+        
+          "{ fragile }"
+          is an alias that disables most actions that are the most likely to
+          cause trouble. This can be used as a last resort for problem sites.
+        
+        
+        
+        
+          
+            
+          
+        
+  { fragile }
  # Handle with care: easy to break
  mail.google.
  mybank.example.com
 
+            
+
+        
+          Remember to flush
+          caches! Note that the mail.google reference lacks the TLD portion (e.g.
+          ".com"). This will effectively match any
+          TLD with google in it, such as mail.google.de., just as an example.
+        
+        
+          If this still does not work, you will have to go through the
+          remaining actions one by one to find which one(s) is causing the
+          problem.
+        
+      
+    
+    
+      
+      
+        
+          
+          
+          
+        
+        
+          
+          
+          
+            Prev
+          
+            Home
+          
+             
+          

+            See Also
+          
+             
+          
+             
           
         
       
-
-      Remember to flush
-      caches! Note that the mail.google
-      reference lacks the TLD portion (e.g. ".com"). This will effectively match any TLD with
-      google in it, such as mail.google.de., just as an example.
-
-      If this still does not work, you will have to go through the
-      remaining actions one by one to find which one(s) is causing the
-      problem.
     
-  
-
-  
-    
-
-    
-      
-        
-
-        
-
-        
-      
-
-      
-        
-
-        
-
-        
-      
-    Prev Home  
See Also    
-  
-
+  
 
+

14. Appendix

14.1. Regular - Expressions

14.2. Privoxy's - Internal Pages

14.3. Chain of - Events

14.4. - Troubleshooting: Anatomy of an Action

+ 14. Appendix +

+ 14.1. Regular Expressions +

+ 14.2. Privoxy's Internal Pages +

+ 14.3. Chain of Events +

+ 14.4. Troubleshooting: Anatomy of an + Action +