X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Ffilter-file.html;h=d81f159060f118c96cfdf5af601f1180c9c02125;hp=99401da273b731a5fdf1f56cbb7fd6dad2cf1131;hb=e3c12117d30c2f42bd47c929099f95295f2c3404;hpb=00ff6723cacb0c08cbf3f1044e8639a89ebc23d7 diff --git a/doc/webserver/user-manual/filter-file.html b/doc/webserver/user-manual/filter-file.html index 99401da2..d81f1590 100644 --- a/doc/webserver/user-manual/filter-file.html +++ b/doc/webserver/user-manual/filter-file.html @@ -1,23 +1,26 @@ +
On-the-fly text substitutions that can be invoked through the - filter action need +> On-the-fly text substitutions need to be defined in a "filter file""action". Mulitple filter files can be - defined through the .
Privoxy supports three different filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server.
Privoxy also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns.
Multiple filter files can be defined through the filterfile config directive. The filters - as supplied by the developers will be found in + as supplied by the developers are located in default.filter.
Typical reasons for doing these kinds of substitutions are to eliminate - common annoyances in HTML and JavaScript, such as pop-up windows, +> Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows without navigation tools, the infamous <BLINK> tag etc, to suppress images with certain width and height attributes (standard banner sizes or web-bugs), - or just to have fun. The possibilities are endless.
Filtering works on any text-based document type, including - HTML, JavaScript, CSS etc. (all text/* - MIME types, except Enabled content filters are applied to any content whose + "Content Type" header is recognised as a sign + of text-based content, with the exception of text/plain). - Substitutions are made at the source level, so if you want to . + Use the force-text-mode action + to also filter other content.
Substitutions are made at the source level, so if you want to "roll your own" filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. By default, filters are only applied - to the document content, but can be extended to the headers with - the supplemental actions: - filter-client-headers and - filter-server-headers.
A filter header line for a filter called Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. + A content filter header line for a filter called "foo" could look @@ -274,7 +326,16 @@ CLASS="LITERAL" > is supported, which turns the default to ungreedy matching.
If you are new to regular expressions, you might want to take a look at +> If you are new to + "Regular + Expressions", you might want to take a look at the Appendix on regular expressions
Now, let's complete our "foo" filter. We have already defined
+> content filter. We have already defined
the heading, but the jobs are still missing. Since all it does is to replace
\1 is
- a backreference to the first parenthesis just like $1 above,
@@ -679,7 +740,7 @@ CLASS="EMPHASIS"
>pattern, a backslash indicates
- a backreference, whereas in the 9.2. The Pre-defined Filters9.2. The Pre-defined Filters The distribution removes code that causes new windows to be opened with undesired properties, such as being
- full-screen, non-resizable, without location, status or menu bar etc.
+ full-screen, non-resizeable, without location, status or menu bar etc.
Use with caution. This is an aggressive filter, and can break sites that
+ rely heavily on JavaScript.
+ This is a very radical measure. It removes virtually all JavaScript event bindings, which
means that scripts can not react to user actions such as mouse movements or clicks, window
- resizing etc, anymore.
+ resizing etc, anymore. Use with caution!
We MARQUEE tags
are neutralized (yeah baby!), and browser windows will be created as
- resizable (as of course they should be!), and will have location,
+ resizeable (as of course they should be!), and will have location,
scroll and menu bars -- even if specified otherwise.
Most cookies are set in the HTTP dialogue, where they can be intercepted
+> Most cookies are set in the HTTP dialog, where they can be intercepted
by the
This filter disables HTML and JavaScript code that reads or sets cookies. Use
- it wherever you would also use the cookie crunch actions.
+> This filter disables most HTML and JavaScript code that reads or sets
+ cookies. It cannot detect all clever uses of these types of code, so it
+ should not be relied on as an absolute fix. Use it wherever you would also
+ use the cookie crunch actions.
Technical note: The filter works by redefining the window.open JavaScript
- function to a dummy function during the loading and rendering phase of each
- HTML page access, and restoring the function afterwards.
+ function to a dummy function, PrivoxyWindowOpen(),
+ during the loading and rendering phase of each HTML page access, and
+ restoring the function afterward.
+ This is recommended only for browsers that cannot perform this function
+ reliably themselves. And be aware that some sites require such windows
+ in order to function normally. Use with caution.
Recommended only for those who require extreme ad blocking. The default + block rules should catch 95+% of all ads without this filter enabled. +
Many consider windows that move, or resize themselves to be abusive. This filter neutralizes the related JavaScript code. Note that some sites might not display - or behave as intended when using this filter. + or behave as intended when using this filter. Use with caution.
A collection of text replacements to disable malicious HTML and JavaScript +> An experimental collection of text replacements to disable malicious HTML and JavaScript code that exploits known security holes in Internet Explorer.
A CSS based block for Google text ads. Also removes a width limitation + and the toolbar advertisement. +
Another CSS based block, this time for Yahoo text ads. And removes + a width limitation as well. +
Another CSS based block, this time for MSN text ads. And removes + tracking URLs, as well as a width limitation. +
Cleans up some Blogspot blogs. Read the fine print before using this one! +
This filter also intentionally removes some navigation stuff and sets the + page width to 100%. As a result, some rounded "corners" would + appear to early or not at all and as fixing this would require a browser + that understands background-size (CSS3), they are removed instead. +
Server-header filter to change the Content-Type from xml to html. +
Server-header filter to change the Content-Type from html to xml. +
Removes the non-standard ping attribute from + anchor and area HTML tags. +
Client-header filter to remove the Tor exit node notation + found in Host and Referer headers. +
If Privoxy and Tor are chained and Privoxy + is configured to use socks4a, one can use "http://www.example.org.foobar.exit/" + to access the host "www.example.org" through the + Tor exit node "foobar". +
As the HTTP client isn't aware of this notation, it treats the + whole string "www.example.org.foobar.exit" as host and uses it + for the "Host" and "Referer" headers. From the + server's point of view the resulting headers are invalid and can cause problems. +
An invalid "Referer" header can trigger "hot-linking" + protections, an invalid "Host" header will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). +
This client-header filter removes the "foo.exit" part in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your Tor exit node based on the IP address + the request is coming from. +