X-Git-Url: http://www.privoxy.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fwebserver%2Fuser-manual%2Fappendix.html;h=87413b5fc5724b478e5402dc6caa656e78ed13c9;hb=47ebc96cd29a0e2de4c752e7b42006c27d365064;hp=e5356f36b8b5667998afb95a5785caf955fca7da;hpb=7017546f48d1189837d3b8d6a523328195279e57;p=privoxy.git diff --git a/doc/webserver/user-manual/appendix.html b/doc/webserver/user-manual/appendix.html index e5356f36..87413b5f 100644 --- a/doc/webserver/user-manual/appendix.html +++ b/doc/webserver/user-manual/appendix.html @@ -65,7 +65,7 @@ CLASS="SECT1" CLASS="SECT1" >9. Appendix14. Appendix
These are just some of the ones you are likely to use when matching URLs with
- Show information about the current configuration:
+ Show information about the current configuration, including viewing and
+ editing of actions files:
- Show the client's request headers:
+ Show the browser's request headers:
9.2. 14.2. Privoxy's Internal Pages
Short cuts. Turn off, then on:
- Edit the actions list file: -
These may be bookmarked for quick reference.
These may be bookmarked for quick reference. See next.Below are some "may not be safe" - just click OK. Then you can run the - Bookmarklet directly from your favourites/bookmarks. For even faster access, + Bookmarklet directly from your favorites/bookmarks. For even faster access, you can put them on the "Links" Enable PrivoxyPrivoxy - Enable
Disable PrivoxyPrivoxy - Disable Toggle PrivoxyPrivoxy - Toggle Privoxy (Toggles between enabled and disabled) View Privoxy StatusPrivoxy- View Status Actions file feedback systemPrivoxy - Submit Filter FeedbackLet's take a quick look at the basic sequence of events when a web page is + requested by your browser and Privoxy is on duty:
First, your web browser requests a web page. The browser knows to send + the request to Privoxy, which will in turn, + relay the request to the remote web server after passing the following + tests: +
Privoxy traps any request for its own internal CGI + pages (e.g http://p.p/) and sends the CGI page back to the browser. +
Next, Privoxy checks to see if the URL + matches any "+block" patterns. If + so, the URL is then blocked, and the remote web server will not be contacted. + "+handle-as-image" + is then checked and if it does not match, an + HTML "BLOCKED" page is sent back. Otherwise, if it does match, + an image is returned. The type of image depends on the setting of "+set-image-blocker" + (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). +
Untrusted URLs are blocked. If URLs are being added to the + trust file, then that is done. +
If the URL pattern matches the "+fast-redirects" action, + it is then processed. Unwanted parts of the requested URL are stripped. +
Now the rest of the client browser's request headers are processed. If any + of these match any of the relevant actions (e.g. "+hide-user-agent", + etc.), headers are suppressed or forged as determined by these actions and + their parameters. +
Now the web server starts sending its response back (i.e. typically a web page and related + data). +
First, the server headers are read and processed to determine, among other + things, the MIME type (document type) and encoding. The headers are then + filtered as deterimed by the + "+prevent-setting-cookies", + "+session-cookies-only", + and "+downgrade-http-version" + actions. +
If the "+kill-popups" + action applies, and it is an HTML or JavaScript document, the popup-code in the + response is filtered on-the-fly as it is received. +
If a "+filter" + or "+deanimate-gifs" + action applies (and the document type fits the action), the rest of the page is + read into memory (up to a configurable limit). Then the filter rules (from + default.filter) are processed against the buffered + content. Filters are applied in the order they are specified in the + default.filter file. Animated GIFs, if present, are + reduced to either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by + Privoxy back to your browser. +
If neither "+filter" + or "+deanimate-gifs" + matches, then Privoxy passes the raw data through + to the client browser as it becomes available. +
As the browser receives the now (probably filtered) page content, it + reads and then requests any URLs that may be embedded within the page + source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. + frames), sounds, etc. For each of these objects, the browser issues a new + request. And each such request is in turn processed as above. Note that a + complex web page may have many such embedded URLs. +
The way Privoxy applies applies
+ "actions"
- and "filters" to any given URL can be complex, and not always so
+>
+ to any given URL can be complex, and not always so
easy to understand what is happening. And sometimes we need to be able to
Privoxy is doing
- is causing us a problem inadvertantly. It can be a little daunting to look at
+ is causing us a problem inadvertently. It can be a little daunting to look at
the actions and filters files themselves, since they tend to be filled with
"regular expressions" whose consequences are not always
- so obvious. One quick test to see if Privoxy is causing a problem
+ or not, is to disable it temporarily. This should be the first troubleshooting
+ step. See the Bookmarklets section on a quick
+ and easy way to do this (be sure to flush caches afterward!). Privoxy provides the
+> also provides the
actions
First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current configuration will handle it. This will not - help with filtering effects from the "+filter" action) from + the default.filter file! It - also will not tell you about any other URLs that may be embedded within the - URL you are testing. For instance, images such as ads are expressed as URLs - within the raw page source of HTML pages. So you will only get info for the - actual URL that is pasted into the prompt area -- not any sub-URLs. If you - want to know about embedded URLs like ads, you will have to dig those out of - the HTML source. Use your browser's "View Page Source" option - for this. Or right click on the ad, and grab the URL.
Let's try an example, google.com, - one section at a time:
System default actions: +> Matches for http://google.com: - { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter - -hide-forwarded -hide-from -hide-referer -hide-user-agent -image - -image-blocker -limit-connect -no-compression -no-cookies-keep - -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer } - - |
This is the top section, and only tells us of the compiled in defaults. This - is basically what Privoxy would do if there - were not any "actions" defined, i.e. it does nothing. Every action - is disabled. This is not particularly informative for our purposes here. OK, - next section:
Matches for http://google.com: +--- File standard --- +(no matches in this file) - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } - / +--- File default --- + +{ -add-header -block +deanimate-gifs{last} -downgrade-http-version +fast-redirects + -filter{popups} -filter{fun} -filter{shockwave-flash} -filter{crude-parental} + +filter{html-annoyances} +filter{js-annoyances} +filter{content-cookies} + +filter{webbugs} +filter{refresh-tags} +filter{nimda} +filter{banners-by-size} + +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} + -hide-user-agent -handle-as-image +set-image-blocker{pattern} -limit-connect + +prevent-compression +session-cookies-only -prevent-reading-cookies + -prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer } +/ - { -no-cookies-keep -no-cookies-read -no-cookies-set } - .google.com + { -session-cookies-only } + .google.com { -fast-redirects } - .google.com + .google.com - |
This is much more informative, and tells us how we have defined our - This tells us how we have defined our + "actions", and which ones match for our example, - , and + which ones match for our example, "google.com". The first grouping shows our default - settings, which would apply to all URLs. If you look at your . The first listing + is any matches for the standard.action file. No hits at + all here on "standard". Then next is "default", or + our default.action file. The large, multi-line listing, + is how the actions are set to match for all URLs, i.e. our default settings. + If you look at your "actions" - file, this would be the section just below the file, this would be the section + just below the "aliases" section - near the top. This applies to all URLs as signified by the single forward - slash -- section near the top. This will apply to + all URLs as signified by the single forward slash at the end of the listing + -- "/". -
.These are the default actions we have enabled. But we can define additional - actions that would be exceptions to these general rules, and then list - specific URLs that these exceptions would apply to. Last match wins. - Just below this then are two explict matches for But we can define additional actions that would be exceptions to these general + rules, and then list specific URLs (or patterns) that these exceptions would + apply to. Last match wins. Just below this then are two explicit matches for + ".google.com". - The first is negating our various cookie blocking actions (i.e. we will allow - cookies here). The second is allowing . The first is negating our previous cookie setting, + which was for "fast-redirects". Note - that there is a leading dot here -- "+session-cookies-only" + (i.e. not persistent). So we will allow persistent cookies for google. The + second turns off any + "+fast-redirects" + action, allowing this to take place unmolested. Note that there is a leading + dot here -- ".google.com". This will - match any hosts and sub-domains, in the google.com domain also, such as +>. This will match any hosts and + sub-domains, in the google.com domain also, such as "www.google.com". So, apparently, we have these actions defined - somewhere in the lower part of our actions file, and - . So, apparently, we have these two actions + defined somewhere in the lower part of our default.action + file, and "google.com" is referenced in these sections.
is referenced somewhere in these latter + sections.Then, for our user.action file, we again have no hits.
And now we pull it altogether in the bottom section and summarize how
+> And finally we pull it all together in the bottom section and summarize how
Privoxy is appying all its is applying all its "actions"
@@ -1258,21 +1577,28 @@ WIDTH="100%"
>
Final results:
-
- -add-header -block -deanimate-gifs -downgrade -fast-redirects
- +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
- +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
- +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
- -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression
- -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer
- -wafer
-
-
Notice the only difference here to the previous listing, is to + "fast-redirects" and "session-cookies-only".
Now another example, "ad.doubleclick.net"
{ +block +image } +> { +block +handle-as-image } .ad.doubleclick.net - { +block +image } + { +block +handle-as-image } ad*. - { +block +image } - .doubleclick.net - -
Any one of these would have done the trick and blocked this as an unwanted @@ -1325,21 +1653,31 @@ CLASS="QUOTE" CLASS="QUOTE" >"ad.doubleclick.net"
- is done here -- as both a "+block" + and an - an + "+image". The custom alias "+handle-as-image". + The custom alias "+imageblock" does this - for us. just simplifies the process and make + it more readable.One last example. Let's try Matches for http://www.rhapsodyk.net/adsl/HOWTO/: - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} + { -add-header -block +deanimate-gifs -downgrade-http-version +fast-redirects + +filter{html-annoyances} +filter{js-annoyances} +filter{kill-popups} +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } + +filter{fun} +hide-forwarded-for-headers +hide-from-header{block} + +hide-referer{forge} -hide-user-agent -handle-as-image +set-image-blocker{blank} + +prevent-compression +session-cookies-only -prevent-setting-cookies + -prevent-reading-cookies +kill-popups -send-vanilla-wafer -send-wafer } / - { +block +image } - /ads - - "/ads"! But we did not want this at all! Now we see why we get the blank page. We could - now add a new action below this that explictly does not - block (-block) pages with "{-block}") paths with "adsl". There are various ways to - handle such exceptions. Example:
. There are + various ways to handle such exceptions. Example:
{ -block } - /adsl - -{ +block +handle-as-image } + /ads
This would probably be most appropriately put in user.action, + for local site exceptions.
"{fragile}"