From 9191f5d733a8cb851bcaf65ac6d90ba6158709f5 Mon Sep 17 00:00:00 2001 From: hal9 Date: Fri, 26 Apr 2002 05:24:36 +0000 Subject: [PATCH] -Add most of Andreas suggestions to Chain of Events section. -A few other minor corrections and touch up. --- doc/source/user-manual.sgml | 207 +++++++++++++++++++++--------------- 1 file changed, 124 insertions(+), 83 deletions(-) diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index c75aa4a7..4cbf7e65 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -5050,15 +5050,15 @@ Requests First, your web browser requests a web page. The browser knows to send - the request to Privoxy, who in turn, - will relay the request to the remote web server after passing the following + the request to Privoxy, which will in turn, + relay the request to the remote web server after passing the following tests: Privoxy traps any request for its own internal CGI - pages (e.g http://p.p/) and sends these back to the browser. + pages (e.g http://p.p/) and sends the CGI page back to the browser. @@ -5066,13 +5066,13 @@ Requests Next, Privoxy checks to see if the URL matches any +block patterns. If - so, the remote web server is not contacted, and the URL is then further - checked against +handle-as-image. If both match, then the - setting of - +set-image-blocker is used to display whichever - option is appropriate. If +handle-as-image - does not match, then the BLOCKED banner page is displayed. + so, the URL is then blocked, and the remote web server will not be contacted. + +handle-as-image + is then checked and if it does not match, an + HTML BLOCKED page is sent back. Otherwise, if it does match, + an image is returned. The type of image depends on the setting of +set-image-blocker + (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). @@ -5083,44 +5083,74 @@ Requests - +fast-redirects - is processed, stripping unwanted parts of the requested web page URL. + If the URL pattern matches the +fast-redirects action, + it is then processed. Unwanted parts of the requested URL are stripped. - At this point, Privoxy now relays the URL to the - web server, requesting the page (assuming nothing up to this point has - prevented getting us from this far). + Now the rest of the client browser's request headers are processed. If any + of these match any of the relevant actions (e.g. +hide-user-agent, + etc.), headers are suppressed or forged as determined by these actions and + their parameters. - The first few hundred bytes are read from the web server and - +kill-popups - is processed, if enabled. + Now the web server starts sending its response back (i.e. typically a web page and related + data). - If +filter - applies, the rest of the page is read into memory and then the filter rules - (from default.filter) are processed. Filters are - applied in the order they are specified in the - default.filter file. The entire page, which is now - filtered, is then sent by Privoxy back to your - browser. + First, the server headers are read and processed to determine, among other + things, the MIME type (document type) and encoding. The headers are then + filtered as deterimed by the + +prevent-setting-cookies, + +session-cookies-only, + and +downgrade-http-version + actions. - As the browser receives the now filtered page content, it will read and request any - embedded URLs on the page, e.g. ad images. As the browser requests these - secondary URLs from whatever server they may be on, - Privoxy handles these same as above, and the process - is repeated all over again for each such URL. Note that a fancy web page may - have many, many such embedded URLs for graphics, frames, etc. + If the +kill-popups + action applies, and it is an HTML or JavaScript document, the popup-code in the + response is filtered on-the-fly as it is received. + + + + + If a +filter + or +deanimate-gifs + action applies (and the document type fits the action), the rest of the page is + read into memory (up to a configurable limit). Then the filter rules (from + default.filter) are processed against the buffered + content. Filters are applied in the order they are specified in the + default.filter file. Animated GIFs, if present, are + reduced to either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by + Privoxy back to your browser. + + + If neither +filter + or +deanimate-gifs + matches, then Privoxy passes the raw data through + to the client browser as it becomes available. + + + + + As the browser receives the now (probably filtered) page content, it + reads and then requests any URLs that may be embedded within the page + source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. + frames), sounds, etc. For each of these objects, the browser issues a new + request. And each such request is in turn processed as above. Note that a + complex web page may have many such embedded URLs. @@ -5166,16 +5196,17 @@ Requests First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current configuration will handle it. This will not - help with filtering effects (i.e. the +filter action) from the - default.filter file since this is handled very differently - and not so easy to trap! It also will not tell you about any other URLs that - may be embedded within the URL you are testing (i.e. a web page). For - instance, images such as ads are expressed as URLs within the raw page source - of HTML pages. So you will only get info for the actual URL that is pasted - into the prompt area -- not any sub-URLs. If you want to know about embedded - URLs like ads, you will have to dig those out of the HTML source. Use your - browser's View Page Source option for this. Or right click on - the ad, and grab the URL. + help with filtering effects (i.e. the +filter action) from + the default.filter file since this is handled very + differently and not so easy to trap! It also will not tell you about any other + URLs that may be embedded within the URL you are testing. For instance, images + such as ads are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the prompt area + -- not any sub-URLs. If you want to know about embedded URLs like ads, you + will have to dig those out of the HTML source. Use your browser's View + Page Source option for this. Or right click on the ad, and grab the + URL. @@ -5198,11 +5229,11 @@ Requests +filter{webbugs} +filter{refresh-tags} +filter{nimda} +filter{banners-by-size} +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} -hide-user-agent -handle-as-image +set-image-blocker{pattern} -limit-connect - +prevent-compression +session-cookies-only +prevent-reading-cookies - +prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer } + +prevent-compression +session-cookies-only -prevent-reading-cookies + -prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer } / - { -prevent-setting-cookies -prevent-reading-cookies } + { -session-cookies-only } .google.com { -fast-redirects } @@ -5215,40 +5246,45 @@ Requests - This tells us how we have defined our actions, and which ones - match for our example, google.com. The first listing is - for the standard.action. No hits at all here on - standard. Then next is default, or our - default.action file. The large, multi-line listing, is - how the actions are set to match for all URLs, i.e. our default settings. If - you look at your actions file, this would be the section just - below the aliases section near the top. This will apply to all - URLs as signified by the single forward slash at the end of the listing -- - /. - + This tells us how we have defined our + actions, and + which ones match for our example, google.com. The first listing + is any matches for the standard.action file. No hits at + all here on standard. Then next is default, or + our default.action file. The large, multi-line listing, + is how the actions are set to match for all URLs, i.e. our default settings. + If you look at your actions file, this would be the section + just below the aliases section near the top. This will apply to + all URLs as signified by the single forward slash at the end of the listing + -- /. But we can define additional actions that would be exceptions to these general rules, and then list specific URLs (or patterns) that these exceptions would apply to. Last match wins. Just below this then are two explicit matches for - .google.com. The first is negating our various cookie blocking - actions (i.e. we will allow cookies here). The second is allowing - fast-redirects to take place. Note that there is a leading dot - here -- .google.com. This will match any hosts and sub-domains, - in the google.com domain also, such as www.google.com. So, - apparently, we have these two actions defined somewhere in the lower part of our - actions file, and google.com is referenced somewhere in these - latter sections. + .google.com. The first is negating our previous cookie setting, + which was for +session-cookies-only + (i.e. not persistent). So we will allow persistent cookies for google. The + second turns off any + +fast-redirects + action, allowing this to take place unmolested. Note that there is a leading + dot here -- .google.com. This will match any hosts and + sub-domains, in the google.com domain also, such as + www.google.com. So, apparently, we have these two actions + defined somewhere in the lower part of our default.action + file, and google.com is referenced somewhere in these latter + sections. - Then, for our user.action file, we again have no hits, as - signified by File user. + Then, for our user.action file, we again have no hits. - And finally we pull it altogether in the bottom section and summarize how + And finally we pull it all together in the bottom section and summarize how Privoxy is applying all its actions to google.com: @@ -5264,7 +5300,7 @@ Requests +filter{webbugs} +filter{refresh-tags} +filter{nimda} +filter{banners-by-size} +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} -hide-user-agent -handle-as-image +set-image-blocker{pattern} -limit-connect - +prevent-compression +session-cookies-only -prevent-reading-cookies + +prevent-compression -session-cookies-only -prevent-reading-cookies -prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer @@ -5272,7 +5308,7 @@ Requests Notice the only difference here to the previous listing, is to - fast-redirects and the two cookie settings. + fast-redirects and session-cookies-only. @@ -5298,8 +5334,9 @@ Requests We'll just show the interesting part here, the explicit matches. It is matched three different times. Each as an +block +handle-as-image, which is the expanded form of one of our aliases that had been defined as: - +imageblock. (Aliases are defined in the - first section of the actions file and typically used to combine more + +imageblock. (Aliases are defined in + the first section of the actions file and typically used to combine more than one action.) @@ -5309,9 +5346,13 @@ Requests would also cover the first. No point in taking chances with these guys though ;-) Note that if you want an ad or obnoxious URL to be invisible, it should be defined as ad.doubleclick.net - is done here -- as both a +block and an - +handle-as-image. The custom alias +imageblock does this - for us. + is done here -- as both a +block + and an + +handle-as-image. + The custom alias +imageblock just simplifies the process and make + it more readable. @@ -5329,8 +5370,8 @@ Requests +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} +filter{fun} +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} -hide-user-agent -handle-as-image +set-image-blocker{blank} - +prevent-compression +session-cookies-only +prevent-setting-cookies - +prevent-reading-cookies +kill-popups -send-vanilla-wafer -send-wafer } + +prevent-compression +session-cookies-only -prevent-setting-cookies + -prevent-reading-cookies +kill-popups -send-vanilla-wafer -send-wafer } / { +block +handle-as-image } @@ -5343,8 +5384,8 @@ Requests Ooops, the /adsl/ is matching /ads! But we did not want this at all! Now we see why we get the blank page. We could now add a new action below this that explicitly does not - block (-block) pages with adsl. There are various ways to - handle such exceptions. Example: + block ({-block}) paths with adsl. There are + various ways to handle such exceptions. Example: @@ -5369,9 +5410,9 @@ Requests - { -block } - /adsl - + { +block +handle-as-image } + /ads + @@ -5399,7 +5440,7 @@ Requests {shop} is an alias that expands to - { -filter -prevent-setting-cookies -prevent-reading-cookies }. + { -filter -session-cookies-only }. Or you could do your own exception to negate filtering: @@ -5415,7 +5456,7 @@ Requests This would probably be most appropriately put in user.action, - for personal user exceptions. + for local site exceptions. -- 2.39.2