X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Ffilter-file.html;h=e0de605f812edb6c1748feec413339289d56e5e3;hp=5ea91beeedb43a61861946f07cd6b853981ab90b;hb=473cfd051580edfa1e2a3f6beeb9a0d09a8253fd;hpb=ae6beecce49ef10b169c4b843580985430bc698b diff --git a/doc/webserver/user-manual/filter-file.html b/doc/webserver/user-manual/filter-file.html index 5ea91bee..e0de605f 100644 --- a/doc/webserver/user-manual/filter-file.html +++ b/doc/webserver/user-manual/filter-file.html @@ -1,13 +1,13 @@ + Filter Files +HREF="../p_doc.css"> Privoxy 3.0.5 User ManualPrivoxy 3.0.9 User Manual9. Filter Files9. Filter Files

On-the-fly text substitutions that can be invoked through the - filter action need +> On-the-fly text substitutions need to be defined in a "filter file""action". Multiple filter files can be - defined through the .

Privoxy supports three different filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server.

Privoxy also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns.

Multiple filter files can be defined through the filterfile config directive. The filters - as supplied by the developers will be found in + as supplied by the developers are located in default.filter.

Typical reasons for doing these kinds of substitutions are to eliminate - common annoyances in HTML and JavaScript, such as pop-up windows, +> Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows without navigation tools, the infamous <BLINK> tag etc, to suppress images with certain width and height attributes (standard banner sizes or web-bugs), - or just to have fun. The possibilities are endless.

Filtering works on any text-based document type, including - HTML, JavaScript, CSS etc. (all text/* - MIME types, except Enabled content filters are applied to any content whose + "Content Type" header is recognised as a sign + of text-based content, with the exception of text/plain). - Substitutions are made at the source level, so if you want to . + Use the force-text-mode action + to also filter other content.

Substitutions are made at the source level, so if you want to "roll your own" filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. By default, filters are only applied - to the raw document content, but can be extended to the HTTP headers with - the supplemental actions: - filter-client-headers and - filter-server-headers.

Just like the filters - here. Each filter consists of a heading line, that starts with the + here. Each filter consists of a heading line, that starts with one of the keywordkeywords FILTER:, followed by - the filter's , + CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: + followed by the filter's actions file.

A filter header line for a filter called Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. + A content filter header line for a filter called "foo" could look @@ -314,14 +366,14 @@ CLASS="SECT2" >

9.1. Filter File Tutorial

9.1. Filter File Tutorial

Now, let's complete our "foo" filter. We have already defined +> content filter. We have already defined the heading, but the jobs are still missing. Since all it does is to replace 9.2. The Pre-defined Filters9.2. The Pre-defined Filters

The distribution

Use with caution. This is an aggressive filter, and can break sites that + rely heavily on JavaScript. +

This is a very radical measure. It removes virtually all JavaScript event bindings, which means that scripts can not react to user actions such as mouse movements or clicks, window - resizing etc, anymore. + resizing etc, anymore. Use with caution!

We

This filter disables HTML and JavaScript code that reads or sets cookies. Use - it wherever you would also use the cookie crunch actions. +> This filter disables most HTML and JavaScript code that reads or sets + cookies. It cannot detect all clever uses of these types of code, so it + should not be relied on as an absolute fix. Use it wherever you would also + use the cookie crunch actions.

Technical note: The filter works by redefining the window.open JavaScript - function to a dummy function during the loading and rendering phase of each - HTML page access, and restoring the function afterward. + function to a dummy function, PrivoxyWindowOpen(), + during the loading and rendering phase of each HTML page access, and + restoring the function afterward. +

This is recommended only for browsers that cannot perform this function + reliably themselves. And be aware that some sites require such windows + in order to function normally. Use with caution.

all pop-up windows from opening. - Note this should be used with more discretion than the above, since it is - more likely to break some sites that require pop-ups for normal usage. Use - with caution. + Note this should be used with even more discretion than the above, since + it is more likely to break some sites that require pop-ups for normal + usage. Use with caution.

Occasionally this filter will cause false positives on images that are not ads, but just happen to be of one of the standard banner sizes.

Recommended only for those who require extreme ad blocking. The default + block rules should catch 95+% of all ads without this filter enabled. +

Many consider windows that move, or resize themselves to be abusive. This filter neutralizes the related JavaScript code. Note that some sites might not display - or behave as intended when using this filter. + or behave as intended when using this filter. Use with caution.

A collection of text replacements to disable malicious HTML and JavaScript +> An experimental collection of text replacements to disable malicious HTML and JavaScript code that exploits known security holes in Internet Explorer.

google

A CSS based block for Google text ads. Also removes a width limitation + and the toolbar advertisement. +

yahoo

Another CSS based block, this time for Yahoo text ads. And removes + a width limitation as well. +

msn

Another CSS based block, this time for MSN text ads. And removes + tracking URLs, as well as a width limitation. +

blogspot

Cleans up some Blogspot blogs. Read the fine print before using this one! +

This filter also intentionally removes some navigation stuff and sets the + page width to 100%. As a result, some rounded "corners" would + appear to early or not at all and as fixing this would require a browser + that understands background-size (CSS3), they are removed instead. +

xml-to-html

Server-header filter to change the Content-Type from xml to html. +

html-to-xml

Server-header filter to change the Content-Type from html to xml. +

no-ping

Removes the non-standard ping attribute from + anchor and area HTML tags. +

hide-tor-exit-notation

Client-header filter to remove the Tor exit node notation + found in Host and Referer headers. +

If Privoxy and Tor are chained and Privoxy + is configured to use socks4a, one can use "http://www.example.org.foobar.exit/" + to access the host "www.example.org" through the + Tor exit node "foobar". +

As the HTTP client isn't aware of this notation, it treats the + whole string "www.example.org.foobar.exit" as host and uses it + for the "Host" and "Referer" headers. From the + server's point of view the resulting headers are invalid and can cause problems. +

An invalid "Referer" header can trigger "hot-linking" + protections, an invalid "Host" header will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). +

This client-header filter removes the "foo.exit" part in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your Tor exit node based on the IP address + the request is coming from. +