X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Ffilter-file.html;h=e0de605f812edb6c1748feec413339289d56e5e3;hp=2ca2bb5b84677580d9a25f161ab560384ab9142e;hb=473cfd051580edfa1e2a3f6beeb9a0d09a8253fd;hpb=a5b1999794b4b0faa68812c0b8b2861316ae8341 diff --git a/doc/webserver/user-manual/filter-file.html b/doc/webserver/user-manual/filter-file.html index 2ca2bb5b..e0de605f 100644 --- a/doc/webserver/user-manual/filter-file.html +++ b/doc/webserver/user-manual/filter-file.html @@ -1,23 +1,28 @@ +
All text substitutions that can be invoked through the +> On-the-fly text substitutions need + to be defined in a "filter file". Once defined, they + can then be invoked as an "action".
Privoxy supports three different filter actions: filter action - must first be defined in the filter file, which is typically - called default.filter and which can be - selected through the to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server.
Privoxy also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns.
Multiple filter files can be defined through the filterfile config - option.
config directive. The filters + as supplied by the developers are located in + default.filter. It is recommended that any locally + defined or modified filters go in a separately defined file such as + user.filter. +Typical reasons for doing such substitutions are to eliminate - common annoyances in HTML and JavaScript, such as pop-up windows, +> Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows without navigation tools, the infamous <BLINK> tag etc, to suppress images with certain width and height attributes (standard banner sizes or web-bugs), - or just to have fun. The possibilities are endless.
Filtering works on any text-based document type, including plain - text, HTML, JavaScript, CSS etc. (all Enabled content filters are applied to any content whose + "Content Type" header is recognised as a sign + of text-based content, with the exception of text/* - MIME types). Substitutions are made at the source level, so if - you want to text/plain. + Use the force-text-mode action + to also filter other content.
Substitutions are made at the source level, so if you want to "roll your own" filters, you should be - familiar with HTML syntax.
"roll + your own" filters, you should first be familiar with HTML syntax, + and, of course, regular expressions.A filter header line for a filter called Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. + A content filter header line for a filter called "foo" could look @@ -230,31 +319,35 @@ CLASS="LITERAL" >s/// operator. If you are familiar with Perl, you will find this to be quite intuitive, and may want to look at the - PCRS man page - for the subtle differences to Perl behaviour. Most notably, the non-standard - option letter U is supported, which turns the default - to ungreedy matching.
is supported, + which turns the default to ungreedy matching.If you are new to regular expressions, you might want to take a look at +> If you are new to + "Regular + Expressions", you might want to take a look at the Appendix on regular expressions, and see the Perl manual for the s/// operator's syntax and Perl-style regular expressions
Now, let's complete our "foo" filter. We have already defined
+> content filter. We have already defined
the heading, but the jobs are still missing. Since all it does is to replace
\1 is
- a backreference to the first parenthesis just like $1 above,
@@ -645,7 +740,7 @@ CLASS="EMPHASIS"
>pattern, a backslash indicates
- a backreference, whereas in the You get the idea? The distribution default.filter file contains a selection of
+pre-defined filters for your convenience: The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
+ To that end, it
+ replaces JavaScript references to the browser's referrer information
+ with the string "Not Your Business!". This compliments the hide-referrer action on the content level.
+ removes the bindings to the DOM's
+ unload
+ event which we feel has no right to exist and is responsible for most "exit consoles", i.e.
+ nasty windows that pop up when you close another one.
+ removes code that causes new windows to be opened with undesired properties, such as being
+ full-screen, non-resizeable, without location, status or menu bar etc.
+ Use with caution. This is an aggressive filter, and can break sites that
+ rely heavily on JavaScript.
+ This is a very radical measure. It removes virtually all JavaScript event bindings, which
+ means that scripts can not react to user actions such as mouse movements or clicks, window
+ resizing etc, anymore. Use with caution!
+ We strongly discourage using this filter as a default since it breaks
+ many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
+ need to go there).
+ This filter will undo many common instances of HTML based abuse.
+ The BLINK and MARQUEE tags
+ are neutralized (yeah baby!), and browser windows will be created as
+ resizeable (as of course they should be!), and will have location,
+ scroll and menu bars -- even if specified otherwise.
+ Most cookies are set in the HTTP dialog, where they can be intercepted
+ by the
+ crunch-incoming-cookies
+ and crunch-outgoing-cookies
+ actions. But web sites increasingly make use of HTML meta tags and JavaScript
+ to sneak cookies to the browser on the content level.
+ This filter disables most HTML and JavaScript code that reads or sets
+ cookies. It cannot detect all clever uses of these types of code, so it
+ should not be relied on as an absolute fix. Use it wherever you would also
+ use the cookie crunch actions.
+ Disable any refresh tags if the interval is greater than nine seconds (so
+ that redirections done via refresh tags are not destroyed). This is useful
+ for dial-on-demand setups, or for those who find this HTML feature
+ annoying.
+ This filter attempts to prevent only "unsolicited" pop-up
+ windows from opening, yet still allow pop-up windows that the user
+ has explicitly chosen to open. It was added in version 3.0.1,
+ as an improvement over earlier such filters.
+ Technical note: The filter works by redefining the window.open JavaScript
+ function to a dummy function, PrivoxyWindowOpen(),
+ during the loading and rendering phase of each HTML page access, and
+ restoring the function afterward.
+ This is recommended only for browsers that cannot perform this function
+ reliably themselves. And be aware that some sites require such windows
+ in order to function normally. Use with caution.
+ Attempt to prevent all pop-up windows from opening.
+ Note this should be used with even more discretion than the above, since
+ it is more likely to break some sites that require pop-ups for normal
+ usage. Use with caution.
+ This is a helper filter that has no value if used alone. It makes the
+ banners-by-size and banners-by-link
+ (see below) filters more effective and should be enabled together with them.
+ This filter removes image tags purely based on what size they are. Fortunately
+ for us, many ads and banner images tend to conform to certain standardized
+ sizes, which makes this filter quite effective for ad stripping purposes.
+ Occasionally this filter will cause false positives on images that are not ads,
+ but just happen to be of one of the standard banner sizes.
+ Recommended only for those who require extreme ad blocking. The default
+ block rules should catch 95+% of all ads without this filter enabled.
+ This is an experimental filter that attempts to kill any banners if
+ their URLs seem to point to known or suspected click trackers. It is currently
+ not of much value and is not recommended for use by default.
+ Webbugs are small, invisible images (technically 1X1 GIF images), that
+ are used to track users across websites, and collect information on them.
+ As an HTML page is loaded by the browser, an embedded image tag causes the
+ browser to contact a third-party site, disclosing the tracking information
+ through the requested URL and/or cookies for that third-party domain, without
+ the user ever becoming aware of the interaction with the third-party site.
+ HTML-ized spam also uses a similar technique to verify email addresses.
+ This filter removes the HTML code that loads such "webbugs".
+ A rather special-purpose filter that can be used to enlarge textareas (those
+ multi-line text boxes in web forms) and turn off hard word wrap in them.
+ It was written for the sourceforge.net tracker system where such boxes are
+ a nuisance, but it can be handy on other sites, too.
+ It is not recommended to use this filter as a default.
+ Many consider windows that move, or resize themselves to be abusive. This filter
+ neutralizes the related JavaScript code. Note that some sites might not display
+ or behave as intended when using this filter. Use with caution.
+ Some web designers seem to assume that everyone in the world will view their
+ web sites using the same browser brand and version, screen resolution etc,
+ because only that assumption could explain why they'd use static frame sizes,
+ yet prevent their frames from being resized by the user, should they be too
+ small to show their whole content.
+ This filter removes the related HTML code. It should only be applied to sites
+ which need it.
+ Many Microsoft products that generate HTML use non-standard extensions (read:
+ violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
+ HTML documents to display with errors on standard-compliant platforms.
+ This filter translates the MS-only characters into Latin-1 equivalents.
+ It is not necessary when using MS products, and will cause corruption of
+ all documents that use 8-bit character sets other than Latin-1. It's mostly
+ worthwhile for Europeans on non-MS platforms, if weird garbage characters
+ sometimes appear on some pages, or user agents that don't correct for this on
+ the fly.
+
+ A filter for shockwave haters. As the name suggests, this filter strips code
+ out of web pages that is used to embed shockwave flash objects.
+ Change HTML code that embeds Quicktime objects so that kioskmode, which
+ prevents saving, is disabled.
+ Text replacements for subversive browsing fun. Make fun of your favorite
+ Monopolist or play buzzword bingo.
+ A demonstration-only filter that shows how Privoxy
+ can be used to delete web content on a keyword basis.
+ An experimental collection of text replacements to disable malicious HTML and JavaScript
+ code that exploits known security holes in Internet Explorer.
+ Presently, it only protects against Nimda and a cross-site scripting bug, and
+ would need active maintenance to provide more substantial protection.
+ Some web sites have very specific problems, the cure for which doesn't apply
+ anywhere else, or could even cause damage on other sites.
+ This is a collection of such site-specific cures which should only be applied
+ to the sites they were intended for, which is what the supplied
+ default.action file does. Users shouldn't need to change
+ anything regarding this filter.
+ A CSS based block for Google text ads. Also removes a width limitation
+ and the toolbar advertisement.
+ Another CSS based block, this time for Yahoo text ads. And removes
+ a width limitation as well.
+ Another CSS based block, this time for MSN text ads. And removes
+ tracking URLs, as well as a width limitation.
+ Cleans up some Blogspot blogs. Read the fine print before using this one!
+ This filter also intentionally removes some navigation stuff and sets the
+ page width to 100%. As a result, some rounded "corners" would
+ appear to early or not at all and as fixing this would require a browser
+ that understands background-size (CSS3), they are removed instead.
+ Server-header filter to change the Content-Type from xml to html.
+ Server-header filter to change the Content-Type from html to xml.
+ Removes the non-standard ping attribute from
+ anchor and area HTML tags.
+ Client-header filter to remove the Tor exit node notation
+ found in Host and Referer headers.
+ If Privoxy and Tor are chained and Privoxy
+ is configured to use socks4a, one can use "http://www.example.org.foobar.exit/"
+ to access the host "www.example.org" through the
+ Tor exit node "foobar".
+ As the HTTP client isn't aware of this notation, it treats the
+ whole string "www.example.org.foobar.exit" as host and uses it
+ for the "Host" and "Referer" headers. From the
+ server's point of view the resulting headers are invalid and can cause problems.
+ An invalid "Referer" header can trigger "hot-linking"
+ protections, an invalid "Host" header will make it impossible for
+ the server to find the right vhost (several domains hosted on the same IP address).
+ This client-header filter removes the "foo.exit" part in those headers
+ to prevent the mentioned problems. Note that it only modifies
+ the HTTP headers, it doesn't make it impossible for the server
+ to detect your Tor exit node based on the IP address
+ the request is coming from.
+ 9.2. The Pre-defined Filters
+