- Privoxy 3.0.18 User Manual -
- Prev -	-	- -

Privoxy 3.0.18 User Manual
Prev

14. Appendix

+ +

14.1. Regular + Expressions

+ +

Privoxy uses Perl-style + "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries.

+ +

If you are reading this, you probably don't understand what + "regular expressions" are, or what they can + do. So this will be a very brief introduction only. A full explanation + would require a book ;-)

+ +

Regular expressions provide a language to describe patterns that can + be run against strings of characters (letter, numbers, etc), to see if + they match the string or not. The patterns are themselves (sometimes + complex) strings of literal characters, combined with wild-cards, and + other special characters, called meta-characters. The "meta-characters" have special meanings and are used to + build complex patterns to be matched against. Perl Compatible Regular + Expressions are an especially convenient "dialect" of the regular expression language.

+ +

To make a simple analogy, we do something similar when we use + wild-card characters when listing files with the dir command in DOS. *.* matches + all filenames. The "special" character here + is the asterisk which matches any and all characters. We can be more + specific and use ? to match just individual + characters. So "dir file?.text" would match + "file1.txt", "file2.txt", etc. We are pattern matching, using a + similar technique to "regular + expressions"!

+ +

Regular expressions do essentially the same thing, but are much, + much more powerful. There are many more "special + characters" and ways of building complex patterns however. Let's + look at a few of the common ones, and then some examples:

+ + + + + + + +

. - Matches any + single character, e.g. "a", + "A", "4", + ":", or "@".

+ + + + + + + +

? - The preceding + character or expression is matched ZERO or ONE times. + Either/or.

+ + + + + + + +

+ - The preceding + character or expression is matched ONE or MORE times.

+ + + + + + + +

* - The preceding + character or expression is matched ZERO or MORE times.

+ + + + + + + +

\ - The + "escape" character denotes that the + following character should be taken literally. This is used where + one of the special characters (e.g. ".") needs to be taken literally and not as a + special meta-character. Example: "example\.com", makes sure the period is + recognized only as a period (and not expanded to its + meta-character meaning of any single character).

+ + + + + + + +

[ ] - Characters + enclosed in brackets will be matched if any of the enclosed + characters are encountered. For instance, "[0-9]" matches any numeric digit (zero through + nine). As an example, we can combine this with "+" to match any digit one of more times: + "[0-9]+".

+ + + + + + + +

( ) - parentheses + are used to group a sub-expression, or multiple + sub-expressions.

+ + + + + + +

| - The + "bar" character works like an + "or" conditional statement. A match is + successful if the sub-expression on either side of "|" matches. As an example: "/(this|that) example/" uses grouping and the bar + character and would match either "this + example" or "that example", and + nothing else.

+ +

These are just some of the ones you are likely to use when matching + URLs with Privoxy, and is a long way + from a definitive list. This is enough to get us started with a few + simple examples which may be more illuminating:

+ +

/.*/banners/.* - A simple example that uses the + common combination of "." and "*" to denote any character, zero or more times. In + other words, any string at all. So we start with a literal forward + slash, then our regular expression pattern (".*") another literal forward slash, the string + "banners", another forward slash, and lastly + another ".*". We are building a directory + path here. This will match any file with the path that has a directory + named "banners" in it. The ".*" matches any characters, and this could conceivably + be more forward slashes, so it might expand into a much longer looking + path. For example, this could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or + just "/banners/annoying.html", or almost an + infinite number of other possible combinations, just so it has + "banners" in the path somewhere.

+ +

And now something a little more complex:

+ +

/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have + several literal forward slashes again ("/"), + so we are building another expression that is a file path statement. We + have another ".*", so we are matching + against any conceivable sub-path, just so it matches our expression. + The only true literal that must + match our pattern is adv, + together with the forward slashes. What comes after the "adv" string is the interesting part.

+ +

Remember the "?" means the preceding + expression (either a literal character or anything grouped with + "(...)" in this case) can exist or not, + since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are the + individual sub-expressions: "(er)", + "(ing|ements?)", and the "s". The "|" means + "or". We have two of those. For instance, + "(ing|ements?)", can expand to match either + "ing" OR "ements?". + What is being done here, is an attempt at matching as many variations + of "advertisement", and similar, as + possible. So this would expand to match just "adv", or "advert", or + "adverts", or "advertising", or "advertisement", or "advertisements". You get the idea. But it would not + match "advertizements" (with a "z"). We could fix that by changing our regular + expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which + would then match either spelling.

+ +

/.*/advert[0-9]+\.(gif|jpe?g) - Again another + path statement with forward slashes. Anything in the square brackets + "[ ]" can be matched. This is using + "0-9" as a shorthand expression to mean any + digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding expression must + be included. The preceding expression here is what is in the square + brackets -- in this case, any digit one through nine. Then, at the end, + we have a grouping: "(gif|jpe?g)". This + includes a "|", so this needs to match the + expression on either side of that bar character also. A simple + "gif" on one side, and the other side will + in turn match either "jpeg" or "jpg", since the "?" means + the letter "e" is optional and can be + matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string "advert", then one or more digits, + and a "." (which is now a literal, and not a + special character, since it is escaped with "\"), and lastly either "gif", or "jpeg", or + "jpg". Some possible matches would include: + "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match + "advert1.gif" (no leading slash), or + "/adverts232.jpg" (the expression does not + include an "s"), or "/advert1.jsp" ("jsp" is not + in the expression anywhere).

+ +

We are barely scratching the surface of regular expressions here so + that you can understand the default Privoxy configuration files, and maybe use this + knowledge to customize your own installation. There is much, much more + that can be done with regular expressions. Now that you know enough to + get started, you can learn more on your own :/

+ +

More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html

+ +

For information on regular expression based substitutions and their + applications in filters, please see the filter file tutorial in this manual.

- 14. Appendix -

- 14.1. Regular Expressions -

- Privoxy uses Perl-style "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries. -

- If you are reading this, you probably don't understand what "regular expressions" are, or what they can - do. So this will be a very brief introduction only. A full - explanation would require a book ;-) -

- Regular expressions provide a language to describe patterns that - can be run against strings of characters (letter, numbers, etc), to - see if they match the string or not. The patterns are themselves - (sometimes complex) strings of literal characters, combined with - wild-cards, and other special characters, called meta-characters. - The "meta-characters" have special - meanings and are used to build complex patterns to be matched - against. Perl Compatible Regular Expressions are an especially - convenient "dialect" of the regular - expression language. -

- To make a simple analogy, we do something similar when we use - wild-card characters when listing files with the dir command in DOS. *.* - matches all filenames. The "special" - character here is the asterisk which matches any and all - characters. We can be more specific and use ? to match just individual characters. So "dir file?.text" would match "file1.txt", "file2.txt", - etc. We are pattern matching, using a similar technique to "regular expressions"! -

- Regular expressions do essentially the same thing, but are much, - much more powerful. There are many more "special characters" and ways of building complex - patterns however. Let's look at a few of the common ones, and then - some examples: -

- - - - - - -

- . - - Matches any single character, e.g. "a", "A", "4", ":", or - "@". -

- - - - - - - -

- ? - The - preceding character or expression is matched ZERO or ONE - times. Either/or. -

- - - - - - - -

- + - The - preceding character or expression is matched ONE or MORE - times. -

- - - - - - - -

- * - The - preceding character or expression is matched ZERO or MORE - times. -

- - - - - - - -

- \ - The - "escape" character denotes that - the following character should be taken literally. This is - used where one of the special characters (e.g. ".") needs to be taken literally and not as a - special meta-character. Example: "example\.com", makes sure the period is - recognized only as a period (and not expanded to its - meta-character meaning of any single character). -

- - - - - - - -

- [ ] - - Characters enclosed in brackets will be matched if any of the - enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero - through nine). As an example, we can combine this with "+" to match any digit one of more - times: "[0-9]+". -

- - - - - - - -

- ( ) - - parentheses are used to group a sub-expression, or multiple - sub-expressions. -

- - - - - - - -

- | - The - "bar" character works like an - "or" conditional statement. A - match is successful if the sub-expression on either side of - "|" matches. As an example: "/(this|that) example/" uses grouping - and the bar character and would match either "this example" or "that - example", and nothing else. -

- -

- These are just some of the ones you are likely to use when matching - URLs with Privoxy, and is a long - way from a definitive list. This is enough to get us started with a - few simple examples which may be more illuminating: -

- /.*/banners/.* - A simple example that - uses the common combination of "." and - "*" to denote any character, zero or - more times. In other words, any string at all. So we start with a - literal forward slash, then our regular expression pattern (".*") another literal forward slash, the - string "banners", another forward slash, - and lastly another ".*". We are building - a directory path here. This will match any file with the path that - has a directory named "banners" in it. - The ".*" matches any characters, and - this could conceivably be more forward slashes, so it might expand - into a much longer looking path. For example, this could match: - "/eye/hate/spammers/banners/annoy_me_please.gif", or - just "/banners/annoying.html", or almost - an infinite number of other possible combinations, just so it has - "banners" in the path somewhere. -

- And now something a little more complex: -

- /.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again ("/"), so we are building another expression that is - a file path statement. We have another ".*", so we are matching against any conceivable - sub-path, just so it matches our expression. The only true literal - that must - match our pattern is adv, together with the forward slashes. What - comes after the "adv" string is the - interesting part. -

- Remember the "?" means the preceding - expression (either a literal character or anything grouped with - "(...)" in this case) can exist or not, - since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are - the individual sub-expressions: "(er)", - "(ing|ements?)", and the "s". The "|" means "or". We have two of those. For instance, - "(ing|ements?)", can expand to match - either "ing" OR "ements?". What is being done here, is an attempt at - matching as many variations of "advertisement", and similar, as possible. So this - would expand to match just "adv", or - "advert", or "adverts", or "advertising", or "advertisement", or "advertisements". You get the idea. But it would not - match "advertizements" (with a "z"). We could fix that by changing our - regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which - would then match either spelling. -

- /.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square - brackets "[ ]" can be matched. This is - using "0-9" as a shorthand expression to - mean any digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding - expression must be included. The preceding expression here is what - is in the square brackets -- in this case, any digit one through - nine. Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this needs to match the expression on either - side of that bar character also. A simple "gif" on one side, and the other side will in turn - match either "jpeg" or "jpg", since the "?" - means the letter "e" is optional and can - be matched once or not at all. So we are building an expression - here to match image GIF or JPEG type image file. It must include - the literal string "advert", then one or - more digits, and a "." (which is now a - literal, and not a special character, since it is escaped with - "\"), and lastly either "gif", or "jpeg", or - "jpg". Some possible matches would - include: "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not - match "advert1.gif" (no leading slash), - or "/adverts232.jpg" (the expression - does not include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere). -

- We are barely scratching the surface of regular expressions here so - that you can understand the default Privoxy configuration files, and maybe use - this knowledge to customize your own installation. There is much, - much more that can be done with regular expressions. Now that you - know enough to get started, you can learn more on your own :/ -

- More reading on Perl Compatible Regular expressions: http://perldoc.perl.org/perlre.html -

- For information on regular expression based substitutions and their - applications in filters, please see the filter file tutorial in this manual. -

- 14.2. Privoxy's Internal Pages -

- Since Privoxy proxies each - requested web page, it is easy for Privoxy to trap certain special URLs. In this - way, we can talk directly to Privoxy, and see how it is configured, see how - our rules are being applied, change these rules and other - configuration options, and even turn Privoxy's filtering off, all with a web - browser. -

- The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, Privoxy must be running to access these. - If not, you will get a friendly error message. Internet access is - not necessary either. -

-
- Privoxy main page: -
- -
-
- http://config.privoxy.org/ -
-
-
- There is a shortcut: http://p.p/ (But it doesn't provide a fall-back to a - real page, in case the request is not sent through Privoxy) -
-
-
- Show information about the current configuration, including - viewing and editing of actions files: -
- -
-
- http://config.privoxy.org/show-status -
-
-
-
- Show the source code version numbers: -
- -
-
- http://config.privoxy.org/show-version -
-
-
-
- Show the browser's request headers: -
- -
-
- http://config.privoxy.org/show-request -
-
-
-
- Show which actions apply to a URL and why: -
- -
-
- http://config.privoxy.org/show-url-info -
-
-
-
- Toggle Privoxy on or off. This feature can be turned off/on in - the main config file. When toggled - "off", "Privoxy" continues to run, but only as a - pass-through proxy, with no actions taking place: -
- -
-
- http://config.privoxy.org/toggle -
-
-
- Short cuts. Turn off, then on: -
- -
-
- http://config.privoxy.org/toggle?set=disable -
-
- -
-
- http://config.privoxy.org/toggle?set=enable -
-
-

- These may be bookmarked for quick reference. See next. -

- 14.2.1. Bookmarklets -

- Below are some "bookmarklets" to allow - you to easily access a "mini" version - of some of Privoxy's special - pages. They are designed for MS Internet Explorer, but should - work equally well in Netscape, Mozilla, and other browsers which - support JavaScript. They are designed to run directly from your - bookmarks - not by clicking the links below (although that should - work for testing). -

- To save them, right-click the link and choose "Add to Favorites" (IE) or "Add Bookmark" (Netscape). You will get a warning - that the bookmark "may not be safe" - - just click OK. Then you can run the Bookmarklet directly from - your favorites/bookmarks. For even faster access, you can put - them on the "Links" bar (IE) or the - "Personal Toolbar" (Netscape), and run - them with a single click. -

-
- Privoxy - Enable -
-
-
- Privoxy - Disable -
-
-
- Privoxy - Toggle Privoxy (Toggles between - enabled and disabled) -
-
-
- Privoxy- View Status -
-
-
- Privoxy - Why? -
-

- -

- Credit: The site which gave us the general idea for these - bookmarklets is www.bookmarklets.com. They have more information about - bookmarklets. -

- 14.3. Chain of Events -

- Let's take a quick look at how some of Privoxy's core features are triggered, and the - ensuing sequence of events when a web page is requested by your - browser: -

14.2. Privoxy's + Internal Pages

+ +

Since Privoxy proxies each + requested web page, it is easy for Privoxy to trap certain special URLs. In this way, + we can talk directly to Privoxy, and + see how it is configured, see how our rules are being applied, change + these rules and other configuration options, and even turn Privoxy's filtering off, all with a web + browser.

+ +

The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, Privoxy must be running to access these. If not, + you will get a friendly error message. Internet access is not necessary + either.

+ +

+
Privoxy main page:
+ +
+
http://config.privoxy.org/
+
+ +
There is a shortcut: http://p.p/ (But it doesn't provide a fall-back to a + real page, in case the request is not sent through Privoxy)
+
+
Show information about the current configuration, including + viewing and editing of actions files:
+ +
+
http://config.privoxy.org/show-status
+
+
+
Show the source code version numbers:
+ +
+
http://config.privoxy.org/show-version
+
+
+
Show the browser's request headers:
+ +
+
http://config.privoxy.org/show-request
+
+
+
Show which actions apply to a URL and why:
+ +
+
http://config.privoxy.org/show-url-info
+
+
+
Toggle Privoxy on or off. This feature can be turned off/on in + the main config file. When toggled + "off", "Privoxy" continues to run, but only as a + pass-through proxy, with no actions taking place:
+ +
+
http://config.privoxy.org/toggle
+
+ +
Short cuts. Turn off, then on:
+ +
+
http://config.privoxy.org/toggle?set=disable
+
+ +
+
http://config.privoxy.org/toggle?set=enable
+
+

+ +

These may be bookmarked for quick reference. See next.

+ +

14.2.1. + Bookmarklets

+ +

Below are some "bookmarklets" to allow + you to easily access a "mini" version of + some of Privoxy's special pages. + They are designed for MS Internet Explorer, but should work equally + well in Netscape, Mozilla, and other browsers which support + JavaScript. They are designed to run directly from your bookmarks - + not by clicking the links below (although that should work for + testing).

+ +

To save them, right-click the link and choose "Add to Favorites" (IE) or "Add + Bookmark" (Netscape). You will get a warning that the bookmark + "may not be safe" - just click OK. Then + you can run the Bookmarklet directly from your favorites/bookmarks. + For even faster access, you can put them on the "Links" bar (IE) or the "Personal + Toolbar" (Netscape), and run them with a single click.

-
- First, your web browser requests a web page. The browser knows - to send the request to Privoxy, which will in turn, relay the - request to the remote web server after passing the following - tests: -
-
-
- Privoxy traps any request for - its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the - browser. -
-
-
- Next, Privoxy checks to see if - the URL matches any "+block" patterns. If so, the URL is - then blocked, and the remote web server will not be contacted. - "+handle-as-image" and "+handle-as-empty-document" are then - checked, and if there is no match, an HTML "BLOCKED" page is sent back to the browser. - Otherwise, if it does match, an image is returned for the - former, and an empty text document for the latter. The type of - image would depend on the setting of "+set-image-blocker" (blank, checkerboard - pattern, or an HTTP redirect to an image elsewhere). -
-
-
- Untrusted URLs are blocked. If URLs are being added to the trust file, then that is done. -
-
-
- If the URL pattern matches the "+fast-redirects" action, it is then - processed. Unwanted parts of the requested URL are stripped. -
-
-
- Now the rest of the client browser's request headers are - processed. If any of these match any of the relevant actions - (e.g. "+hide-user-agent", etc.), headers are - suppressed or forged as determined by these actions and their - parameters. -
+
Privoxy - Enable
-
- Now the web server starts sending its response back (i.e. - typically a web page). -
+
Privoxy - Disable
-
- First, the server headers are read and processed to determine, - among other things, the MIME type (document type) and encoding. - The headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions. -
+
Privoxy - Toggle Privoxy (Toggles between + enabled and disabled)
-
- If any "+filter" action or "+deanimate-gifs" action applies (and the - document type fits the action), the rest of the page is read - into memory (up to a configurable limit). Then the filter rules - (from default.filter and any other - filter files) are processed against the buffered content. - Filters are applied in the order they are specified in one of - the filter files. Animated GIFs, if present, are reduced to - either the first or last frame, depending on the action - setting.The entire page, which is now filtered, is then sent by - Privoxy back to your browser. -
-
- If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the - client browser as it becomes available. -
+
Privoxy- View Status
-
- As the browser receives the now (possibly filtered) page - content, it reads and then requests any URLs that may be - embedded within the page source, e.g. ad images, stylesheets, - JavaScript, other HTML documents (e.g. frames), sounds, etc. - For each of these objects, the browser issues a separate - request (this is easily viewable in Privoxy's logs). And each such request is - in turn processed just as above. Note that a complex web page - will have many, many such embedded URLs. If these secondary - requests are to a different server, then quite possibly a very - differing set of actions is triggered. -
+
Privoxy - Why?

- NOTE: This is somewhat of a simplistic overview of what happens - with each URL request. For the sake of brevity and simplicity, we - have focused on Privoxy's core - features only. -

Credit: The site which gave us the general idea for these + bookmarklets is www.bookmarklets.com. They have more information about + bookmarklets.

- 14.4. Troubleshooting: Anatomy of an - Action -

- The way Privoxy applies actions and filters to any given URL can be - complex, and not always so easy to understand what is happening. - And sometimes we need to be able to see just what Privoxy is doing. Especially, if something - Privoxy is doing is causing us a - problem inadvertently. It can be a little daunting to look at the - actions and filters files themselves, since they tend to be filled - with regular expressions whose - consequences are not always so obvious. -

- One quick test to see if Privoxy - is causing a problem or not, is to disable it temporarily. This - should be the first troubleshooting step. See the Bookmarklets section on a - quick and easy way to do this (be sure to flush caches afterward!). - Looking at the logs is a good idea too. (Note that both the toggle - feature and logging are enabled via config file settings, and may need to be turned - "on".) -

- Another easy troubleshooting step to try is if you have done any - customization of your installation, revert back to the installed - defaults and see if that helps. There are times the developers get - complaints about one thing or another, and the problem is more - related to a customized configuration issue. -

- Privoxy also provides the http://config.privoxy.org/show-url-info page that can - show us very specifically how actions are being applied to any given URL. - This is a big help for troubleshooting. -

- First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current - configuration will handle it. This will not help with filtering - effects (i.e. the "+filter" action) from one of the filter files - since this is handled very differently and not so easy to trap! It - also will not tell you about any other URLs that may be embedded - within the URL you are testing. For instance, images such as ads - are expressed as URLs within the raw page source of HTML pages. So - you will only get info for the actual URL that is pasted into the - prompt area -- not any sub-URLs. If you want to know about embedded - URLs like ads, you will have to dig those out of the HTML source. - Use your browser's "View Page Source" - option for this. Or right click on the ad, and grab the URL. -

- Let's try an example, google.com, and look at it one section at a time in a - sample configuration (your real configuration may vary): -

- - -

+    
+
+    
+      14.3. Chain of
+      Events
+
+      Let's take a quick look at how some of Privoxy's core features are triggered, and the
+      ensuing sequence of events when a web page is requested by your
+      browser:
+
+      
+        
+          First, your web browser requests a web page. The browser knows
+          to send the request to Privoxy,
+          which will in turn, relay the request to the remote web server
+          after passing the following tests:
+        
+
+        
+          Privoxy traps any request for
+          its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the
+          browser.
+        
+
+        
+          Next, Privoxy checks to see if
+          the URL matches any "+block" patterns. If so, the URL is then
+          blocked, and the remote web server will not be contacted. "+handle-as-image" and "+handle-as-empty-document" are then checked,
+          and if there is no match, an HTML "BLOCKED" page is sent back to the browser.
+          Otherwise, if it does match, an image is returned for the former,
+          and an empty text document for the latter. The type of image would
+          depend on the setting of "+set-image-blocker" (blank, checkerboard
+          pattern, or an HTTP redirect to an image elsewhere).
+        
+
+        
+          Untrusted URLs are blocked. If URLs are being added to the
+          trust file, then that is done.
+        
+
+        
+          If the URL pattern matches the "+fast-redirects" action, it is then processed.
+          Unwanted parts of the requested URL are stripped.
+        
+
+        
+          Now the rest of the client browser's request headers are
+          processed. If any of these match any of the relevant actions (e.g.
+          "+hide-user-agent", etc.), headers are
+          suppressed or forged as determined by these actions and their
+          parameters.
+        
+
+        
+          Now the web server starts sending its response back (i.e.
+          typically a web page).
+        
+
+        
+          First, the server headers are read and processed to determine,
+          among other things, the MIME type (document type) and encoding. The
+          headers are then filtered as determined by the "+crunch-incoming-cookies", "+session-cookies-only", and "+downgrade-http-version" actions.
+        
+
+        
+          If any "+filter" action or "+deanimate-gifs" action applies (and the
+          document type fits the action), the rest of the page is read into
+          memory (up to a configurable limit). Then the filter rules (from
+          default.filter and any other filter
+          files) are processed against the buffered content. Filters are
+          applied in the order they are specified in one of the filter files.
+          Animated GIFs, if present, are reduced to either the first or last
+          frame, depending on the action setting.The entire page, which is
+          now filtered, is then sent by Privoxy back to your browser.
+
+          If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the
+          client browser as it becomes available.
+        
+
+        
+          As the browser receives the now (possibly filtered) page
+          content, it reads and then requests any URLs that may be embedded
+          within the page source, e.g. ad images, stylesheets, JavaScript,
+          other HTML documents (e.g. frames), sounds, etc. For each of these
+          objects, the browser issues a separate request (this is easily
+          viewable in Privoxy's logs). And
+          each such request is in turn processed just as above. Note that a
+          complex web page will have many, many such embedded URLs. If these
+          secondary requests are to a different server, then quite possibly a
+          very differing set of actions is triggered.
+        
+      
+
+      NOTE: This is somewhat of a simplistic overview of what happens with
+      each URL request. For the sake of brevity and simplicity, we have
+      focused on Privoxy's core features
+      only.
+    
+
+    
+      14.4.
+      Troubleshooting: Anatomy of an Action
+
+      The way Privoxy applies actions and filters to any given URL can be complex,
+      and not always so easy to understand what is happening. And sometimes
+      we need to be able to see
+      just what Privoxy is doing.
+      Especially, if something Privoxy is
+      doing is causing us a problem inadvertently. It can be a little
+      daunting to look at the actions and filters files themselves, since
+      they tend to be filled with regular
+      expressions whose consequences are not always so obvious.
+
+      One quick test to see if Privoxy is
+      causing a problem or not, is to disable it temporarily. This should be
+      the first troubleshooting step. See the Bookmarklets section on a quick
+      and easy way to do this (be sure to flush caches afterward!). Looking
+      at the logs is a good idea too. (Note that both the toggle feature and
+      logging are enabled via config file settings,
+      and may need to be turned "on".)
+
+      Another easy troubleshooting step to try is if you have done any
+      customization of your installation, revert back to the installed
+      defaults and see if that helps. There are times the developers get
+      complaints about one thing or another, and the problem is more related
+      to a customized configuration issue.
+
+      Privoxy also provides the http://config.privoxy.org/show-url-info page that can show
+      us very specifically how actions are
+      being applied to any given URL. This is a big help for
+      troubleshooting.
+
+      First, enter one URL (or partial URL) at the prompt, and then
+      Privoxy will tell us how the current
+      configuration will handle it. This will not help with filtering effects
+      (i.e. the "+filter" action) from one of the filter files since
+      this is handled very differently and not so easy to trap! It also will
+      not tell you about any other URLs that may be embedded within the URL
+      you are testing. For instance, images such as ads are expressed as URLs
+      within the raw page source of HTML pages. So you will only get info for
+      the actual URL that is pasted into the prompt area -- not any sub-URLs.
+      If you want to know about embedded URLs like ads, you will have to dig
+      those out of the HTML source. Use your browser's "View Page Source" option for this. Or right click on
+      the ad, and grab the URL.
+
+      Let's try an example, google.com, and look at it one section at a time in a sample
+      configuration (your real configuration may vary):
+
+      
+        
+          
-          
-        
+              Matches for http://www.google.com:
 
  In file: default.action [ View ] [ View ] [ Edit ]
 (no matches in this file)
 
-            
-
-        
-          This is telling us how we have defined our "actions", and which ones match for our test
-          case, "google.com". Displayed is all the
-          actions that are available to us. Remember, the + sign denotes "on". - denotes "off". So
-          some are "on" here, but many are "off". Each example we try may provide a
-          slightly different end result, depending on our configuration
-          directives.
-        
-        
-          The first listing is for our default.action file. The large, multi-line listing,
-          is how the actions are set to match for all URLs, i.e. our default
-          settings. If you look at your "actions"
-          file, this would be the section just below the "aliases" section near the top. This will apply to
-          all URLs as signified by the single forward slash at the end of the
-          listing -- " / ".
-        
-        
-          But we have defined additional actions that would be exceptions to
-          these general rules, and then we list specific URLs (or patterns)
-          that these exceptions would apply to. Last match wins. Just below
-          this then are two explicit matches for ".google.com". The first is negating our previous
-          cookie setting, which was for "+session-cookies-only" (i.e. not persistent).
-          So we will allow persistent cookies for google, at least that is
-          how it is in this example. The second turns off any "+fast-redirects" action, allowing this to take
-          place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and
-          sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these
-          two actions defined as exceptions to the general rules at the top
-          somewhere in the lower part of our default.action file, and "google.com" is referenced somewhere in these latter
-          sections.
-        
-        
-          Then, for our user.action file, we again
-          have no hits. So there is nothing google-specific that we might
-          have added to our own, local configuration. If there was, those
-          actions would over-rule any actions from previously processed
-          files, such as default.action. user.action typically has the last word. This is
-          the best place to put hard and fast exceptions,
-        
-        
-          And finally we pull it all together in the bottom section and
-          summarize how Privoxy is applying
-          all its "actions" to "google.com":
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      This is telling us how we have defined our "actions",
+      and which ones match for our test case, "google.com". Displayed is all the actions that are
+      available to us. Remember, the + sign denotes
+      "on". - denotes
+      "off". So some are "on" here, but many are "off". Each example we try may provide a slightly
+      different end result, depending on our configuration directives.
+
+      The first listing is for our default.action file. The large, multi-line listing, is
+      how the actions are set to match for all URLs, i.e. our default
+      settings. If you look at your "actions"
+      file, this would be the section just below the "aliases" section near the top. This will apply to all
+      URLs as signified by the single forward slash at the end of the listing
+      -- " / ".
+
+      But we have defined additional actions that would be exceptions to
+      these general rules, and then we list specific URLs (or patterns) that
+      these exceptions would apply to. Last match wins. Just below this then
+      are two explicit matches for ".google.com".
+      The first is negating our previous cookie setting, which was for
+      "+session-cookies-only" (i.e. not persistent). So we
+      will allow persistent cookies for google, at least that is how it is in
+      this example. The second turns off any "+fast-redirects" action, allowing this to take
+      place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and
+      sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these two
+      actions defined as exceptions to the general rules at the top somewhere
+      in the lower part of our default.action file,
+      and "google.com" is referenced somewhere in
+      these latter sections.
+
+      Then, for our user.action file, we again
+      have no hits. So there is nothing google-specific that we might have
+      added to our own, local configuration. If there was, those actions
+      would over-rule any actions from previously processed files, such as
+      default.action. user.action typically has the last word. This is the
+      best place to put hard and fast exceptions,
+
+      And finally we pull it all together in the bottom section and
+      summarize how Privoxy is applying all
+      its "actions" to "google.com":
+
+      
+        
+          
-          
-        
+              Final results:
 
  -add-header
@@ -924,27 +809,23 @@ In file: user.action [ View ] 
-            
-
-        
-          Notice the only difference here to the previous listing, is to
-          "fast-redirects" and "session-cookies-only", which are activated
-          specifically for this site in our configuration, and thus show in
-          the "Final Results".
-        
-        
-          Now another example, "ad.doubleclick.net":
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Notice the only difference here to the previous listing, is to
+      "fast-redirects" and "session-cookies-only", which are activated specifically
+      for this site in our configuration, and thus show in the "Final Results".
+
+      Now another example, "ad.doubleclick.net":
+
+      
+        
+          
-          
-        
+              { +block{Domains starts with "ad"} }
   ad*.
 
@@ -954,48 +835,41 @@ In file: user.action [ View ] 
-            
-
-        
-          We'll just show the interesting part here - the explicit matches.
-          It is matched three different times. Two "+block{}" sections, and a "+block{} +handle-as-image", which is the expanded
-          form of one of our aliases that had been defined as: "+block-as-image". ("Aliases" are defined in the first section of
-          the actions file and typically used to combine more than one
-          action.)
-        
-        
-          Any one of these would have done the trick and blocked this as an
-          unwanted image. This is unnecessarily redundant since the last case
-          effectively would also cover the first. No point in taking chances
-          with these guys though ;-) Note that if you want an ad or obnoxious
-          URL to be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}" and an "+handle-as-image". The custom alias "+block-as-image"
-          just simplifies the process and make it more readable.
-        
-        
-          One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is
-          giving us problems. We are getting a blank page. Hmmm ...
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      We'll just show the interesting part here - the explicit matches. It
+      is matched three different times. Two "+block{}" sections, and a "+block{}
+      +handle-as-image", which is the expanded form of one of our
+      aliases that had been defined as: "+block-as-image". ("Aliases"
+      are defined in the first section of the actions file and typically used
+      to combine more than one action.)
+
+      Any one of these would have done the trick and blocked this as an
+      unwanted image. This is unnecessarily redundant since the last case
+      effectively would also cover the first. No point in taking chances with
+      these guys though ;-) Note that if you want an ad or obnoxious URL to
+      be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}"
+      and an "+handle-as-image". The custom alias "+block-as-image" just
+      simplifies the process and make it more readable.
+
+      One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is giving
+      us problems. We are getting a blank page. Hmmm ...
+
+      
+        
+          
-          
-        
+              Matches for http://www.example.net/adsl/HOWTO/:
 
  In file: default.action [ View ] [ View ] 
-            
-
-        
-          Ooops, the "/adsl/" is matching "/ads" in our configuration! But we did not
-          want this at all! Now we see why we get the blank page. It is
-          actually triggering two different actions here, and the effects are
-          aggregated so that the URL is blocked, and Privoxy is told to treat the block as if it
-          were an image. But this is, of course, all wrong. We could now add
-          a new action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}") paths with "adsl" in them (remember, last match in the
-          configuration wins). There are various ways to handle such
-          exceptions. Example:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Ooops, the "/adsl/" is matching
+      "/ads" in our configuration! But we did not
+      want this at all! Now we see why we get the blank page. It is actually
+      triggering two different actions here, and the effects are aggregated
+      so that the URL is blocked, and Privoxy is told to treat the block as if it were
+      an image. But this is, of course, all wrong. We could now add a new
+      action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}")
+      paths with "adsl" in them (remember, last
+      match in the configuration wins). There are various ways to handle such
+      exceptions. Example:
+
+      
+        
+          
-          
-        
+              { -block }
   /adsl
 
-            
-
-        
-          Now the page displays ;-) Remember to flush your browser's caches
-          when making these kinds of changes to your configuration to insure
-          that you get a freshly delivered page! Or, try using Shift+Reload.
-        
-        
-          But now what about a situation where we get no explicit matches
-          like we did with:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Now the page displays ;-) Remember to flush your browser's caches
+      when making these kinds of changes to your configuration to insure that
+      you get a freshly delivered page! Or, try using Shift+Reload.
+
+      But now what about a situation where we get no explicit matches like
+      we did with:
+
+      
+        
+          
-          
-        
+              { +block{Path starts with "ads".} +handle-as-image }
  /ads
 
-            
-
-        
-          That actually was very helpful and pointed us quickly to where the
-          problem was. If you don't get this kind of match, then it means one
-          of the default rules in the first section of default.action is causing the problem. This would
-          require some guesswork, and maybe a little trial and error to
-          isolate the offending rule. One likely cause would be one of the "+filter" actions. These tend to be harder to
-          troubleshoot. Try adding the URL for the site to one of aliases
-          that turn off "+filter":
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      That actually was very helpful and pointed us quickly to where the
+      problem was. If you don't get this kind of match, then it means one of
+      the default rules in the first section of default.action is causing the problem. This would
+      require some guesswork, and maybe a little trial and error to isolate
+      the offending rule. One likely cause would be one of the "+filter"
+      actions. These tend to be harder to troubleshoot. Try adding the URL
+      for the site to one of aliases that turn off "+filter":
+
+      
+        
+          
-          
-        
+              { shop }
  .quietpc.com
  .worldpay.com   # for quietpc.com
@@ -1143,111 +1006,96 @@ In file: user.action [ View ] 
-            
-
-        
-          "{ shop }" is
-          an "alias" that expands to "{ -filter -session-cookies-only
-          }". Or you could do your own exception to negate
-          filtering:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      "{ shop }" is an
+      "alias" that expands to "{ -filter -session-cookies-only
+      }". Or you could do your own exception to negate
+      filtering:
+
+      
+        
+          
-          
-        
+              { -filter }
  # Disable ALL filter actions for sites in this section
  .forbes.com
  developer.ibm.com
  localhost
 
-            
-
-        
-          This would turn off all filtering for these sites. This is best put
-          in user.action, for local site
-          exceptions. Note that when a simple domain pattern is used by
-          itself (without the subsequent path portion), all sub-pages within
-          that domain are included automatically in the scope of the action.
-        
-        
-          Images that are inexplicably being blocked, may well be hitting the
-          "+filter{banners-by-size}" rule, which assumes
-          that images of certain sizes are ad banners (works well most of the time
-          since these tend to be standardized).
-        
-        
-          "{ fragile }"
-          is an alias that disables most actions that are the most likely to
-          cause trouble. This can be used as a last resort for problem sites.
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      This would turn off all filtering for these sites. This is best put
+      in user.action, for local site exceptions.
+      Note that when a simple domain pattern is used by itself (without the
+      subsequent path portion), all sub-pages within that domain are included
+      automatically in the scope of the action.
+
+      Images that are inexplicably being blocked, may well be hitting the
+      "+filter{banners-by-size}" rule, which assumes that
+      images of certain sizes are ad banners (works well most of the time since these tend to be
+      standardized).
+
+      "{ fragile }" is
+      an alias that disables most actions that are the most likely to cause
+      trouble. This can be used as a last resort for problem sites.
+
+      
+        
+          
-          
-        
+              { fragile }
  # Handle with care: easy to break
  mail.google.
  mybank.example.com
 
-            
-
-        
-          Remember to flush
-          caches! Note that the mail.google reference lacks the TLD portion (e.g.
-          ".com"). This will effectively match any
-          TLD with google in it, such as mail.google.de., just as an example.
-        
-        
-          If this still does not work, you will have to go through the
-          remaining actions one by one to find which one(s) is causing the
-          problem.
-        
-      
-    
-    
-      
-      
-        
-          
-          
-          
-        
-        
-          
-          
-          
-            Prev
-          
-            Home
-          
-             
-          

-            See Also
-          
-             
-          
-             
           
         
       
+
+      Remember to flush caches!
+      Note that the mail.google reference lacks the
+      TLD portion (e.g. ".com"). This will
+      effectively match any TLD with google in it,
+      such as mail.google.de., just as an
+      example.
+
+      If this still does not work, you will have to go through the
+      remaining actions one by one to find which one(s) is causing the
+      problem.
     
-  
-
+  
 
+  
+    
+
+    
+      
+        
+
+        
+
+        
+      
+
+      
+        
+
+        
+
+        
+      
+    Prev Home  
See Also    
+  
+
+

14. Appendix

14.1. Regular + Expressions

- 14. Appendix -

- 14.1. Regular Expressions -

- 14.2. Privoxy's Internal Pages -

- 14.2.1. Bookmarklets -

- 14.3. Chain of Events -

14.2. Privoxy's + Internal Pages

14.2.1. + Bookmarklets

- 14.4. Troubleshooting: Anatomy of an - Action -

14.3. Chain of + Events

14.4. + Troubleshooting: Anatomy of an Action