From: Fabian Keil Date: Thu, 5 Apr 2007 11:47:51 +0000 (+0000) Subject: Some updates regarding header filtering, X-Git-Tag: v_3_0_7~288 X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=commitdiff_plain;h=740f1bb7087065eeb921792711bba4db3277cef8 Some updates regarding header filtering, handling of compressed content and redirect's support for pcrs commands. --- diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index 2294fe96..e2a45aae 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -11,7 +11,7 @@ - + @@ -33,9 +33,9 @@ This file belongs into ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/ - $Id: user-manual.sgml,v 2.27 2006/11/14 01:57:47 hal9 Exp $ + $Id: user-manual.sgml,v 2.28 2006/12/10 23:42:48 hal9 Exp $ - Copyright (C) 2001- 2006 Privoxy Developers http://www.privoxy.org + Copyright (C) 2001-2007 Privoxy Developers http://www.privoxy.org/ See LICENSE. ======================================================================== @@ -54,12 +54,12 @@ - Copyright &my-copy; 2001 - 2006 by + Copyright &my-copy; 2001 - 2007 by Privoxy Developers -$Id: user-manual.sgml,v 2.27 2006/11/14 01:57:47 hal9 Exp $ +$Id: user-manual.sgml,v 2.28 2006/12/10 23:42:48 hal9 Exp $ @@ -2719,6 +2721,84 @@ for details. + + +client-header-filter + + + + Typical use: + + + Rewrite or remove single client headers. + + + + + + Effect: + + + All client headers to which this action applies are filtered on-the-fly through + the specified regular expression based substitutions. + + + + + + Type: + + + Parameterized. + + + + + Parameter: + + + The name of a client-header filter, as defined in one of the + filter files. + + + + + + Notes: + + + Client-header filters are applied to each header on its own, not to + all at once. This makes it easier to diagnose problems, but on the downside + you can't write filters that only change header x if header y's value is z. + + + Client-header filters are executed after the other header actions have finished + and use their output as input. + + + Please refer to the filter file chapter + to learn which client-header filters are available by default, and how to + create your own. + + + + + + Example usage (section): + + + +{+client-header-filter{hide-tor-exit-notation}} +.exit/ + + + + + + + + + - -filter-client-headers - - - - Typical use: - - - To apply filtering to the client's (browser's) headers - - - - - - Effect: - - - By default, Privoxy's filters only apply - to the document content itself. This will extend those filters to - include the client's headers as well. - - - - - - Type: - - - Boolean. - - - - - Parameter: - - - N/A - - - - - - Notes: - - - Regular expressions can be used to filter headers as well. Check your - filters closely before activating this action, as it can easily lead to broken - requests. - - - These filters are applied to each header on its own, not to them - all at once. This makes it easier to diagnose problems, but on the downside - you can't write filters that only change header x if header y's value is - z. - - - The filters are used after the other header actions have finished and can - use their output as input. - - - - Whenever possible one should specify ^, - $, the whole header name and the colon, to make sure - the filter doesn't cause havoc to other headers or the - page itself. For example if you want to transform - Galeon User-Agents to - Firefox User-Agents you - shouldn't use: - - - -s@Galeon/\d\.\d\.\d @@ - - - but: - - -s@^(User-Agent:.*) Galeon/\d\.\d\.\d (Firefox/\d\.\d\.\d\.\d)$@$1 $2@ - - - - - - - Example usage (section): - - - -{+filter-client-headers +filter{test_filter}} -problem-host.example.com - - - - - - - - - - - -filter-server-headers - - - - Typical use: - - - To apply filtering to the server's headers - - - - - - Effect: - - - By default, Privoxy's filters only apply - to the document content itself. This will extend those filters to - include the server's headers as well. - - - - - - Type: - - - Boolean. - - - - - Parameter: - - - N/A - - - - Notes: - - - Similar to filter-client-headers, but works on - the server instead. To filter both server and client, use both. - - - As with filter-client-headers, check your - filters before activating this action, as it can easily lead to broken - requests. - - - These filters are applied to each header on its own, not to them - all at once. This makes it easier to diagnose problems, but on the downside - you can't write filters that only change header x if header y's value is - z. - - - The filters are used after the other header actions have finished and can - use their output as input. - - - Remember too, whenever possible one should specify ^, - $, the whole header name and the colon, to make sure - the filter doesn't cause havoc to other headers or the - page itself. See above for example. - - - - - - - Example usage (section): - - - -{+filter-server-headers +filter{test_filter}} -problem-host.example.com - - - - - @@ -5046,23 +4922,33 @@ new action More and more websites send their content compressed by default, which - is generally a good idea and saves bandwidth. But for the filter, deanimate-gifs - and kill-popups actions to work, - Privoxy needs access to the uncompressed data. - Unfortunately, Privoxy can't yet(!) uncompress, filter, and - re-compress the content on the fly. So if you want to ensure that all websites, including - those that normally compress, can be filtered, you need to use this action. + and kill-popups actions need + access to the uncompressed data. + + + When compiled with zlib support (available since &my-app; 3.0.7), content that should be + filtered is decompressed on-the-fly and you don't have to worry about this action. + If you are using an older &my-app; version, or one that hasn't been compiled with zlib + support, this action can be used to convince the server to send the content uncompressed. - This will slow down transfers from those websites, though. If you use any of the above-mentioned - actions, you will typically want to use prevent-compression in conjunction - with them. + Most text-based instances compress very well, the size is seldom decreased by less than 50%, + for markup-heavy instances like news feeds saving more than 90% of the original size isn't + unusual. + + + Not using compression will therefore slow down the transfer, and you should only + enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all + predefined action settings. Note that some (rare) ill-configured sites don't handle requests for uncompressed - documents correctly (they send an empty document body). If you use prevent-compression - per default, you'll have to add exceptions for those sites. See the example for how to do that. + documents correctly. Broken PHP applications tend to send an empty document body, + some IIS versions only send the beginning of the content. If you enable + prevent-compression per default, you might want to add + exceptions for those sites. See the example for how to do that. @@ -5085,11 +4971,10 @@ new action { +prevent-compression } / # Match all sites -# Then maybe make exceptions for ill-behaved sites: +# Then maybe make exceptions for broken sites: # { -prevent-compression } - .debianhelp.org - www.pclinuxonline.com +.compusa.com/ @@ -5233,7 +5118,7 @@ new action Parameter: - Any URL. + An absolute URL or a single pcrs command. @@ -5242,21 +5127,22 @@ new action Notes: - This action is useful to replace whole documents with ones of your - choosing. This can be used to enforce safe surfing, or just as a simple - convenience. - - - You can do the same by combining the actions - block, - handle-as-image and - set-image-blocker{URL}. - It doesn't sound right for non-image documents, and that's why this action - was created. + Requests to which this action applies are answered with a + HTTP redirect to URLs of your choosing. The new URL is + either provided as parameter, or derived by applying a + single pcrs command to the original URL. This action will be ignored if you use it together with block. + It can be combined with + fast-redirects{check-decoded-url} + to redirect to a decoded version of a rewritten URL. + + + Use this action carefully, make sure not to create redirection loops + and be aware that using your own redirects might make it + possible to fingerprint your requests. @@ -5270,8 +5156,15 @@ new action example.com/stylesheet\.css # Create a short, easy to remember nickname for a favorite site +# (relies on the browser accept and forward invalid URLs to &my-app;) { +redirect{http://www.privoxy.org/user-manual/actions-file.html} } - a + a + +# Always use the expanded view for Undeadly.org articles +# (Note the $ at the end of the URL pattern to make sure +# the request for the rewritten URL isn't redirected as well) +{+redirect{s@$@&mode=expanded@}} +undeadly.org/cgi\?action=article&sid=\d*$ @@ -5412,6 +5305,86 @@ my-internal-testing-server.void + + +server-header-filter + + + + Typical use: + + + Rewrite or remove single server headers. + + + + + + Effect: + + + All server headers to which this action applies are filtered on-the-fly + through the specified regular expression based substitutions. + + + + + + Type: + + + Parameterized. + + + + + Parameter: + + + The name of a server-header filter, as defined in one of the + filter files. + + + + + + Notes: + + + Server-header filters are applied to each header on its own, not to + all at once. This makes it easier to diagnose problems, but on the downside + you can't write filters that only change header x if header y's value is z. + + + Server-header filters are executed after the other header actions have finished + and use their output as input. + + + Please refer to the filter file chapter + to learn which server-header filters are available by default, and how to + create your own. + + + + + Example usage (section): + + + +{+server-header-filter{html-to-xml}} +example.org/xml-instance-that-is-delivered-as-html + +{+server-header-filter{xml-to-html}} +example.org/instance-that-is-delivered-as-xml-but-is-not + + + + + + + + + session-cookies-only @@ -5935,6 +5908,7 @@ that also explains why and how aliases are used: ########################################################################## { \ -add-header \ + -client-header-filter{hide-tor-exit-notation} \ -block \ -content-type-overwrite \ -crunch-client-header \ @@ -5965,16 +5939,11 @@ that also explains why and how aliases are used: -filter{fun} \ -filter{crude-parental} \ +filter{ie-exploits} \ - -filter-client-headers \ - -filter-server-headers \ - -filter-google \ - -filter-yahoo \ - -filter-msn \ - -filter-blogspot \ - -filter-xml-to-html \ - -filter-html-to-xml \ - -filter-no-ping \ - -filter-hide-tor-exit-notation \ + -filter{google} \ + -filter{yahoo} \ + -filter{msn} \ + -filter{blogspot} \ + -filter{no-ping} \ -force-text-mode \ -handle-as-empty-document \ -handle-as-image \ @@ -5993,6 +5962,8 @@ that also explains why and how aliases are used: -redirect \ -send-vanilla-wafer \ -send-wafer \ + -server-header-filter{xml-to-html} \ + -server-header-filter{html-to-xml} \ +session-cookies-only \ +set-image-blocker{pattern} \ -treat-forbidden-connects-like-blocks \ @@ -6529,11 +6500,23 @@ stupid-server.example.com/ Filter Files - On-the-fly text substitutions that can be invoked through the - filter action need + On-the-fly text substitutions need to be defined in a filter file. Once defined, they - can then be invoked as an action. Multiple filter files can be - defined through the action. + + + + &my-app; supports three different filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server, and + + + + Multiple filter files can be defined through the filterfile config directive. The filters as supplied by the developers will be found in default.filter. It is recommended that any locally @@ -6543,33 +6526,30 @@ stupid-server.example.com/ - Typical reasons for doing these kinds of substitutions are to eliminate - common annoyances in HTML and JavaScript, such as pop-up windows, + Command tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows without navigation tools, the infamous <BLINK> tag etc, to suppress images with certain width and height attributes (standard banner sizes or web-bugs), - or just to have fun. The possibilities are endless. + or just to have fun. - Filtering works on any text-based document type, including + Content filtering works on any text-based document type, including HTML, JavaScript, CSS etc. (all text/* MIME types, except text/plain). Substitutions are made at the source level, so if you want to roll your own filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. By default, filters are only applied - to the raw document content, but can be extended to the HTTP headers with - the supplemental actions: - filter-client-headers and - filter-server-headers. + and, of course, regular expressions. Just like the actions files, the filter file is organized in sections, which are called filters - here. Each filter consists of a heading line, that starts with the - keyword FILTER:, followed by - the filter's name, and a short (one line) + here. Each filter consists of a heading line, that starts with one of the + keywords FILTER:, + CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: + followed by the filter's name, and a short (one line) description of what it does. Below that line come the jobs, i.e. lines that define the actual text substitutions. By convention, the name of a filter @@ -6586,7 +6566,7 @@ stupid-server.example.com/ - A filter header line for a filter called foo could look + A content filter header line for a filter called foo could look like this: @@ -6624,7 +6604,7 @@ stupid-server.example.com/ Filter File Tutorial - Now, let's complete our foo filter. We have already defined + Now, let's complete our foo content filter. We have already defined the heading, but the jobs are still missing. Since all it does is to replace foo with bar, there is only one (trivial) job needed: @@ -7247,7 +7227,7 @@ pre-defined filters for your convenience: xml-to-html - Header filter to change the Content-Type from xml to html. + Server-header filter to change the Content-Type from xml to html. @@ -7256,7 +7236,7 @@ pre-defined filters for your convenience: html-to-xml - Header filter to change the Content-Type from html to xml. + Server-header filter to change the Content-Type from html to xml. @@ -7275,9 +7255,33 @@ pre-defined filters for your convenience: hide-tor-exit-notation - Header filter to remove the Tor exit node notation + Client-header filter to remove the Tor exit node notation found in Host and Referer headers. + + If &my-app; and Tor are chained and &my-app; + is configured to use socks4a, one can use http://www.example.org.foobar.exit/ + to access the host www.example.org through the + Tor exit node foobar. + + + As the HTTP client isn't aware of this notation, it treats the + whole string www.example.org.foobar.exit as host and uses it + for the Host and Referer headers. From the + server's point of view the resulting headers are invalid and can cause problems. + + + An invalid Referer header can trigger hot-linking + protections, an invalid Host header will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). + + + This client-header filter removes the foo.exit part in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your Tor exit node based on the IP address + the request is coming from. + @@ -8082,6 +8086,7 @@ Requests {-add-header -block + -client-header-filter{hide-tor-exit-notation} -content-type-overwrite -crunch-client-header -crunch-if-none-match @@ -8116,12 +8121,7 @@ Requests -filter {yahoo} -filter {msn} -filter {blogspot} - -filter {xml-to-html} - -filter {html-to-xml} -filter {no-ping} - -filter{hide-tor-exit-notation} - -filter-client-headers - -filter-server-headers -force-text-mode -handle-as-empty-document -handle-as-image @@ -8140,6 +8140,8 @@ Requests -redirect -send-vanilla-wafer -send-wafer + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} +session-cookies-only +set-image-blocker {pattern} -treat-forbidden-connects-like-blocks } @@ -8220,6 +8222,7 @@ In file: user.action [ View ] [ Edit ][ View ] [ Edit ][ View ] [ Edit ] @@ -8347,6 +8347,7 @@ In file: user.action [ View ] [ Edit ][ View ] [ Edit ][ View ] [ Edit ][ View ] [ Edit ]