X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Ffilter-file.html;h=b4c348e8c054e59c2f334af22e847b0cd74597be;hp=5ea91beeedb43a61861946f07cd6b853981ab90b;hb=d0194faafeb1b286783e649b0628e51bc81840d5;hpb=ae6beecce49ef10b169c4b843580985430bc698b diff --git a/doc/webserver/user-manual/filter-file.html b/doc/webserver/user-manual/filter-file.html index 5ea91bee..b4c348e8 100644 --- a/doc/webserver/user-manual/filter-file.html +++ b/doc/webserver/user-manual/filter-file.html @@ -1,13 +1,13 @@ +
On-the-fly text substitutions that can be invoked through the - filter action need +> On-the-fly text substitutions need to be defined in a "filter file""action". Multiple filter files can be - defined through the .
Privoxy supports three different filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server.
Privoxy also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns.
Multiple filter files can be defined through the filterfile config directive. The filters - as supplied by the developers will be found in + as supplied by the developers are located in default.filter.
Typical reasons for doing these kinds of substitutions are to eliminate - common annoyances in HTML and JavaScript, such as pop-up windows, +> Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows without navigation tools, the infamous <BLINK> tag etc, to suppress images with certain width and height attributes (standard banner sizes or web-bugs), - or just to have fun. The possibilities are endless.
Filtering works on any text-based document type, including - HTML, JavaScript, CSS etc. (all text/* - MIME types, except Enabled content filters are applied to any content whose + "Content Type" header is recognised as a sign + of text-based content, with the exception of text/plain). - Substitutions are made at the source level, so if you want to . + Use the force-text-mode action + to also filter other content.
Substitutions are made at the source level, so if you want to "roll your own" filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. By default, filters are only applied - to the raw document content, but can be extended to the HTTP headers with - the supplemental actions: - filter-client-headers and - filter-server-headers.
A filter header line for a filter called Filter definitions start with a header line that contains the filter
+ type, the filter name and the filter description.
+ A content filter header line for a filter called "foo" could look
@@ -314,14 +366,14 @@ CLASS="SECT2"
> Now, let's complete our "foo" filter. We have already defined
+> content filter. We have already defined
the heading, but the jobs are still missing. Since all it does is to replace
9.2. The Pre-defined Filters9.2. The Pre-defined Filters The distribution
Use with caution. This is an aggressive filter, and can break sites that
+ rely heavily on JavaScript.
+ This is a very radical measure. It removes virtually all JavaScript event bindings, which
means that scripts can not react to user actions such as mouse movements or clicks, window
- resizing etc, anymore.
+ resizing etc, anymore. Use with caution!
We This filter disables HTML and JavaScript code that reads or sets cookies. Use
- it wherever you would also use the cookie crunch actions.
+> This filter disables most HTML and JavaScript code that reads or sets
+ cookies. It cannot detect all clever uses of these types of code, so it
+ should not be relied on as an absolute fix. Use it wherever you would also
+ use the cookie crunch actions.
Technical note: The filter works by redefining the window.open JavaScript
- function to a dummy function during the loading and rendering phase of each
- HTML page access, and restoring the function afterward.
+ function to a dummy function, PrivoxyWindowOpen(),
+ during the loading and rendering phase of each HTML page access, and
+ restoring the function afterward.
+ This is recommended only for browsers that cannot perform this function
+ reliably themselves. And be aware that some sites require such windows
+ in order to function normally. Use with caution.
Recommended only for those who require extreme ad blocking. The default
+ block rules should catch 95+% of all ads without this filter enabled.
+ Many consider windows that move, or resize themselves to be abusive. This filter
neutralizes the related JavaScript code. Note that some sites might not display
- or behave as intended when using this filter.
+ or behave as intended when using this filter. Use with caution.
A collection of text replacements to disable malicious HTML and JavaScript
+> An experimental collection of text replacements to disable malicious HTML and JavaScript
code that exploits known security holes in Internet Explorer.
A CSS based block for Google text ads. Also removes a width limitation
+ and the toolbar advertisement.
+ Another CSS based block, this time for Yahoo text ads. And removes
+ a width limitation as well.
+ Another CSS based block, this time for MSN text ads. And removes
+ tracking URLs, as well as a width limitation.
+ Cleans up some Blogspot blogs. Read the fine print before using this one!
+ This filter also intentionally removes some navigation stuff and sets the
+ page width to 100%. As a result, some rounded "corners" would
+ appear to early or not at all and as fixing this would require a browser
+ that understands background-size (CSS3), they are removed instead.
+ Server-header filter to change the Content-Type from xml to html.
+ Server-header filter to change the Content-Type from html to xml.
+ Removes the non-standard ping attribute from
+ anchor and area HTML tags.
+ Client-header filter to remove the Tor exit node notation
+ found in Host and Referer headers.
+ If Privoxy and Tor are chained and Privoxy
+ is configured to use socks4a, one can use "http://www.example.org.foobar.exit/"
+ to access the host "www.example.org" through the
+ Tor exit node "foobar".
+ As the HTTP client isn't aware of this notation, it treats the
+ whole string "www.example.org.foobar.exit" as host and uses it
+ for the "Host" and "Referer" headers. From the
+ server's point of view the resulting headers are invalid and can cause problems.
+ An invalid "Referer" header can trigger "hot-linking"
+ protections, an invalid "Host" header will make it impossible for
+ the server to find the right vhost (several domains hosted on the same IP address).
+ This client-header filter removes the "foo.exit" part in those headers
+ to prevent the mentioned problems. Note that it only modifies
+ the HTTP headers, it doesn't make it impossible for the server
+ to detect your Tor exit node based on the IP address
+ the request is coming from.
+ 9.1. Filter File Tutorial
9.1. Filter File Tutorial