1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
6 <title>Filter Files</title>
7 <meta name="GENERATOR" content=
8 "Modular DocBook HTML Stylesheet Version 1.79">
9 <link rel="HOME" title="Privoxy 3.0.22 User Manual" href="index.html">
10 <link rel="PREVIOUS" title="Actions Files" href="actions-file.html">
11 <link rel="NEXT" title="Privoxy's Template Files" href="templates.html">
12 <link rel="STYLESHEET" type="text/css" href="../p_doc.css">
13 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
14 <link rel="STYLESHEET" type="text/css" href="p_doc.css">
17 <body class="SECT1" bgcolor="#EEEEEE" text="#000000" link="#0000FF" vlink=
18 "#840084" alink="#0000FF">
19 <div class="NAVHEADER">
20 <table summary="Header navigation table" width="100%" border="0"
21 cellpadding="0" cellspacing="0">
23 <th colspan="3" align="center">Privoxy 3.0.22 User Manual</th>
27 <td width="10%" align="left" valign="bottom"><a href=
28 "actions-file.html" accesskey="P">Prev</a></td>
30 <td width="80%" align="center" valign="bottom"></td>
32 <td width="10%" align="right" valign="bottom"><a href=
33 "templates.html" accesskey="N">Next</a></td>
36 <hr align="left" width="100%">
40 <h1 class="SECT1"><a name="FILTER-FILE" id="FILTER-FILE">9. Filter
43 <p>On-the-fly text substitutions need to be defined in a <span class=
44 "QUOTE">"filter file"</span>. Once defined, they can then be invoked as
45 an <span class="QUOTE">"action"</span>.</p>
47 <p><span class="APPLICATION">Privoxy</span> supports three different
48 pcrs-based filter actions: <tt class="LITERAL"><a href=
49 "actions-file.html#FILTER">filter</a></tt> to rewrite the content that is
50 send to the client, <tt class="LITERAL"><a href=
51 "actions-file.html#CLIENT-HEADER-FILTER">client-header-filter</a></tt> to
52 rewrite headers that are send by the client, and <tt class=
54 "actions-file.html#SERVER-HEADER-FILTER">server-header-filter</a></tt> to
55 rewrite headers that are send by the server.</p>
57 <p><span class="APPLICATION">Privoxy</span> also supports two tagger
58 actions: <tt class="LITERAL"><a href=
59 "actions-file.html#CLIENT-HEADER-TAGGER">client-header-tagger</a></tt>
60 and <tt class="LITERAL"><a href=
61 "actions-file.html#SERVER-HEADER-TAGGER">server-header-tagger</a></tt>.
62 Taggers and filters use the same syntax in the filter files, the
63 difference is that taggers don't modify the text they are filtering, but
64 use a rewritten version of the filtered text as tag. The tags can then be
65 used to change the applying actions through sections with <a href=
66 "actions-file.html#TAG-PATTERN">tag-patterns</a>.</p>
68 <p>Finally <span class="APPLICATION">Privoxy</span> supports the
69 <tt class="LITERAL"><a href=
70 "actions-file.html#EXTERNAL-FILTER">external-filter</a></tt> action to
71 enable <tt class="LITERAL"><a href=
72 "filter-file.html#EXTERNAL-FILTER-SYNTAX">external filters</a></tt>
73 written in proper programming languages.</p>
75 <p>Multiple filter files can be defined through the <tt class=
76 "LITERAL"><a href="config.html#FILTERFILE">filterfile</a></tt> config
77 directive. The filters as supplied by the developers are located in
78 <tt class="FILENAME">default.filter</tt>. It is recommended that any
79 locally defined or modified filters go in a separately defined file such
80 as <tt class="FILENAME">user.filter</tt>.</p>
82 <p>Common tasks for content filters are to eliminate common annoyances in
83 HTML and JavaScript, such as pop-up windows, exit consoles, crippled
84 windows without navigation tools, the infamous <BLINK> tag etc, to
85 suppress images with certain width and height attributes (standard banner
86 sizes or web-bugs), or just to have fun.</p>
88 <p>Enabled content filters are applied to any content whose <span class=
89 "QUOTE">"Content Type"</span> header is recognised as a sign of
90 text-based content, with the exception of <tt class=
91 "LITERAL">text/plain</tt>. Use the <a href=
92 "actions-file.html#FORCE-TEXT-MODE">force-text-mode</a> action to also
93 filter other content.</p>
95 <p>Substitutions are made at the source level, so if you want to
96 <span class="QUOTE">"roll your own"</span> filters, you should first be
97 familiar with HTML syntax, and, of course, regular expressions.</p>
99 <p>Just like the <a href="actions-file.html">actions files</a>, the
100 filter file is organized in sections, which are called <span class=
101 "emphasis"><i class="EMPHASIS">filters</i></span> here. Each filter
102 consists of a heading line, that starts with one of the <span class=
103 "emphasis"><i class="EMPHASIS">keywords</i></span> <tt class=
104 "LITERAL">FILTER:</tt>, <tt class="LITERAL">CLIENT-HEADER-FILTER:</tt> or
105 <tt class="LITERAL">SERVER-HEADER-FILTER:</tt> followed by the filter's
106 <span class="emphasis"><i class="EMPHASIS">name</i></span>, and a short
107 (one line) <span class="emphasis"><i class=
108 "EMPHASIS">description</i></span> of what it does. Below that line come
109 the <span class="emphasis"><i class="EMPHASIS">jobs</i></span>, i.e.
110 lines that define the actual text substitutions. By convention, the name
111 of a filter should describe what the filter <span class=
112 "emphasis"><i class="EMPHASIS">eliminates</i></span>. The comment is used
113 in the <a href="http://config.privoxy.org/" target="_top">web-based user
116 <p>Once a filter called <tt class="REPLACEABLE"><i>name</i></tt> has been
117 defined in the filter file, it can be invoked by using an action of the
118 form +<tt class="LITERAL"><a href=
119 "actions-file.html#FILTER">filter</a>{<tt class=
120 "REPLACEABLE"><i>name</i></tt>}</tt> in any <a href=
121 "actions-file.html">actions file</a>.</p>
123 <p>Filter definitions start with a header line that contains the filter
124 type, the filter name and the filter description. A content filter header
125 line for a filter called <span class="QUOTE">"foo"</span> could look like
128 <table border="0" bgcolor="#E0E0E0" width="100%">
132 FILTER: foo Replace all "foo" with "bar"
138 <p>Below that line, and up to the next header line, come the jobs that
139 define what text replacements the filter executes. They are specified in
140 a syntax that imitates <a href="http://www.perl.org/" target=
141 "_top">Perl</a>'s <tt class="LITERAL">s///</tt> operator. If you are
142 familiar with Perl, you will find this to be quite intuitive, and may
143 want to look at the PCRS documentation for the subtle differences to Perl
146 <p>Most notably, the non-standard option letter <tt class=
147 "LITERAL">U</tt> is supported, which turns the default to ungreedy
148 matching (add <tt class="LITERAL">?</tt> to quantifiers to turn them
151 <p>The non-standard option letter <tt class="LITERAL">D</tt> (dynamic)
152 allows to use the variables $host, $origin (the IP address the request
153 came from), $path and $url. They will be replaced with the value they
154 refer to before the filter is executed.</p>
156 <p>Note that '$' is a bad choice for a delimiter in a dynamic filter as
157 you might end up with unintended variables if you use a variable name
158 directly after the delimiter. Variables will be resolved without escaping
159 anything, therefore you also have to be careful not to chose delimiters
160 that appear in the replacement text. For example '<' should be save,
161 while '?' will sooner or later cause conflicts with $url.</p>
163 <p>The non-standard option letter <tt class="LITERAL">T</tt> (trivial)
164 prevents parsing for backreferences in the substitute. Use it if you want
165 to include text like '$&' in your substitute without quoting.</p>
167 <p>If you are new to <a href=
168 "http://en.wikipedia.org/wiki/Regular_expressions" target=
169 "_top"><span class="QUOTE">"Regular Expressions"</span></a>, you might
170 want to take a look at the <a href="appendix.html#REGEX">Appendix on
171 regular expressions</a>, and see the <a href=
172 "http://perldoc.perl.org/perlre.html" target="_top">Perl manual</a> for
173 <a href="http://perldoc.perl.org/perlop.html" target="_top">the
174 <tt class="LITERAL">s///</tt> operator's syntax</a> and <a href=
175 "http://perldoc.perl.org/perlre.html" target="_top">Perl-style regular
176 expressions</a> in general. The below examples might also help to get you
180 <h2 class="SECT2"><a name="AEN5285" id="AEN5285">9.1. Filter File
183 <p>Now, let's complete our <span class="QUOTE">"foo"</span> content
184 filter. We have already defined the heading, but the jobs are still
185 missing. Since all it does is to replace <span class=
186 "QUOTE">"foo"</span> with <span class="QUOTE">"bar"</span>, there is
187 only one (trivial) job needed:</p>
189 <table border="0" bgcolor="#E0E0E0" width="100%">
199 <p>But wait! Didn't the comment say that <span class=
200 "emphasis"><i class="EMPHASIS">all</i></span> occurrences of
201 <span class="QUOTE">"foo"</span> should be replaced? Our current job
202 will only take care of the first <span class="QUOTE">"foo"</span> on
203 each page. For global substitution, we'll need to add the <tt class=
204 "LITERAL">g</tt> option:</p>
206 <table border="0" bgcolor="#E0E0E0" width="100%">
216 <p>Our complete filter now looks like this:</p>
218 <table border="0" bgcolor="#E0E0E0" width="100%">
222 FILTER: foo Replace all "foo" with "bar"
229 <p>Let's look at some real filters for more interesting examples. Here
230 you see a filter that protects against some common annoyances that
231 arise from JavaScript abuse. Let's look at its jobs one after the
234 <table border="0" bgcolor="#E0E0E0" width="100%">
238 FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
240 # Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
242 s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
248 <p>Following the header line and a comment, you see the job. Note that
249 it uses <tt class="LITERAL">|</tt> as the delimiter instead of
250 <tt class="LITERAL">/</tt>, because the pattern contains a forward
251 slash, which would otherwise have to be escaped by a backslash
252 (<tt class="LITERAL">\</tt>).</p>
254 <p>Now, let's examine the pattern: it starts with the text <tt class=
255 "LITERAL"><script.*</tt> enclosed in parentheses. Since the dot
256 matches any character, and <tt class="LITERAL">*</tt> means:
257 <span class="QUOTE">"Match an arbitrary number of the element left of
258 myself"</span>, this matches <span class="QUOTE">"<script"</span>,
259 followed by <span class="emphasis"><i class="EMPHASIS">any</i></span>
260 text, i.e. it matches the whole page, from the start of the first
261 <script> tag.</p>
263 <p>That's more than we want, but the pattern continues: <tt class=
264 "LITERAL">document\.referrer</tt> matches only the exact string
265 <span class="QUOTE">"document.referrer"</span>. The dot needed to be
266 <span class="emphasis"><i class="EMPHASIS">escaped</i></span>, i.e.
267 preceded by a backslash, to take away its special meaning as a joker,
268 and make it just a regular dot. So far, the meaning is: Match from the
269 start of the first <script> tag in a the page, up to, and
270 including, the text <span class="QUOTE">"document.referrer"</span>, if
271 <span class="emphasis"><i class="EMPHASIS">both</i></span> are present
272 in the page (and appear in that order).</p>
274 <p>But there's still more pattern to go. The next element, again
275 enclosed in parentheses, is <tt class="LITERAL">.*</script></tt>.
276 You already know what <tt class="LITERAL">.*</tt> means, so the whole
277 pattern translates to: Match from the start of the first <script>
278 tag in a page to the end of the last <script> tag, provided that
279 the text <span class="QUOTE">"document.referrer"</span> appears
280 somewhere in between.</p>
282 <p>This is still not the whole story, since we have ignored the options
283 and the parentheses: The portions of the page matched by sub-patterns
284 that are enclosed in parentheses, will be remembered and be available
285 through the variables <tt class="LITERAL">$1, $2, ...</tt> in the
286 substitute. The <tt class="LITERAL">U</tt> option switches to ungreedy
287 matching, which means that the first <tt class="LITERAL">.*</tt> in the
288 pattern will only <span class="QUOTE">"eat up"</span> all text in
289 between <span class="QUOTE">"<script"</span> and the <span class=
290 "emphasis"><i class="EMPHASIS">first</i></span> occurrence of
291 <span class="QUOTE">"document.referrer"</span>, and that the second
292 <tt class="LITERAL">.*</tt> will only span the text up to the
293 <span class="emphasis"><i class="EMPHASIS">first</i></span>
294 <span class="QUOTE">"</script>"</span> tag. Furthermore, the
295 <tt class="LITERAL">s</tt> option says that the match may span multiple
296 lines in the page, and the <tt class="LITERAL">g</tt> option again
297 means that the substitution is global.</p>
299 <p>So, to summarize, the pattern means: Match all scripts that contain
300 the text <span class="QUOTE">"document.referrer"</span>. Remember the
301 parts of the script from (and including) the start tag up to (and
302 excluding) the string <span class="QUOTE">"document.referrer"</span> as
303 <tt class="LITERAL">$1</tt>, and the part following that string, up to
304 and including the closing tag, as <tt class="LITERAL">$2</tt>.</p>
306 <p>Now the pattern is deciphered, but wasn't this about substituting
307 things? So lets look at the substitute: <tt class="LITERAL">$1"Not Your
308 Business!"$2</tt> is easy to read: The text remembered as <tt class=
309 "LITERAL">$1</tt>, followed by <tt class="LITERAL">"Not Your
310 Business!"</tt> (<span class="emphasis"><i class=
311 "EMPHASIS">including</i></span> the quotation marks!), followed by the
312 text remembered as <tt class="LITERAL">$2</tt>. This produces an exact
313 copy of the original string, with the middle part (the <span class=
314 "QUOTE">"document.referrer"</span>) replaced by <tt class=
315 "LITERAL">"Not Your Business!"</tt>.</p>
317 <p>The whole job now reads: Replace <span class=
318 "QUOTE">"document.referrer"</span> by <tt class="LITERAL">"Not Your
319 Business!"</tt> wherever it appears inside a <script> tag. Note
320 that this job won't break JavaScript syntax, since both the original
321 and the replacement are syntactically valid string objects. The script
322 just won't have access to the referrer information anymore.</p>
324 <p>We'll show you two other jobs from the JavaScript taming department,
325 but this time only point out the constructs of special interest:</p>
327 <table border="0" bgcolor="#E0E0E0" width="100%">
331 # The status bar is for displaying link targets, not pointless blahblah
333 s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
339 <p><tt class="LITERAL">\s</tt> stands for whitespace characters (space,
340 tab, newline, carriage return, form feed), so that <tt class=
341 "LITERAL">\s*</tt> means: <span class="QUOTE">"zero or more
342 whitespace"</span>. The <tt class="LITERAL">?</tt> in <tt class=
343 "LITERAL">.*?</tt> makes this matching of arbitrary text ungreedy.
344 (Note that the <tt class="LITERAL">U</tt> option is not set). The
345 <tt class="LITERAL">['"]</tt> construct means: <span class="QUOTE">"a
346 single <span class="emphasis"><i class="EMPHASIS">or</i></span> a
347 double quote"</span>. Finally, <tt class="LITERAL">\1</tt> is a
348 back-reference to the first parenthesis just like <tt class=
349 "LITERAL">$1</tt> above, with the difference that in the <span class=
350 "emphasis"><i class="EMPHASIS">pattern</i></span>, a backslash
351 indicates a back-reference, whereas in the <span class=
352 "emphasis"><i class="EMPHASIS">substitute</i></span>, it's the
355 <p>So what does this job do? It replaces assignments of single- or
356 double-quoted strings to the <span class="QUOTE">"window.status"</span>
357 object with a dummy assignment (using a variable name that is hopefully
358 odd enough not to conflict with real variables in scripts). Thus, it
359 catches many cases where e.g. pointless descriptions are displayed in
360 the status bar instead of the link target when you move your mouse over
363 <table border="0" bgcolor="#E0E0E0" width="100%">
367 # Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
369 s/(<body [^>]*)onunload(.*>)/$1never$2/iU
375 <p>Including the <a href=
376 "http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
377 target="_top">OnUnload event binding</a> in the HTML DOM was a
378 <span class="emphasis"><i class="EMPHASIS">CRIME</i></span>. When I
379 close a browser window, I want it to close and die. Basta. This job
380 replaces the <span class="QUOTE">"onunload"</span> attribute in
381 <span class="QUOTE">"<body>"</span> tags with the dummy word
382 <tt class="LITERAL">never</tt>. Note that the <tt class=
383 "LITERAL">i</tt> option makes the pattern matching case-insensitive.
384 Also note that ungreedy matching alone doesn't always guarantee a
385 minimal match: In the first parenthesis, we had to use <tt class=
386 "LITERAL">[^>]*</tt> instead of <tt class="LITERAL">.*</tt> to
387 prevent the match from exceeding the <body> tag if it doesn't
388 contain <span class="QUOTE">"OnUnload"</span>, but the page's content
391 <table border="0" bgcolor="#E0E0E0" width="100%">
395 # Completely removeKill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
397 s/(<body [^>]*)onunload(.*>)/$1never$2/iU
403 <p>The last example is from the fun department:</p>
405 <table border="0" bgcolor="#E0E0E0" width="100%">
409 FILTER: fun Fun text replacements
411 # Spice the daily news:
413 s/microsoft(?!\.com)/MicroSuck/ig
419 <p>Note the <tt class="LITERAL">(?!\.com)</tt> part (a so-called
420 negative lookahead) in the job's pattern, which means: Don't match, if
421 the string <span class="QUOTE">".com"</span> appears directly following
422 <span class="QUOTE">"microsoft"</span> in the page. This prevents links
423 to microsoft.com from being trashed, while still replacing the word
426 <table border="0" bgcolor="#E0E0E0" width="100%">
430 # Buzzword Bingo (example for extended regex syntax)
432 s* industry[ -]leading \
434 | customer[ -]focused \
436 | award[ -]winning # Comments are OK, too! \
437 | high[ -]performance \
438 | solutions[ -]based \
442 *<font color="red"><b>BINGO!</b></font> \
449 <p>The <tt class="LITERAL">x</tt> option in this job turns on extended
450 syntax, and allows for e.g. the liberal use of (non-interpreted!)
451 whitespace for nicer formatting.</p>
453 <p>You get the idea?</p>
457 <h2 class="SECT2"><a name="PREDEFINED-FILTERS" id=
458 "PREDEFINED-FILTERS">9.2. The Pre-defined Filters</a></h2>
460 <p>The distribution <tt class="FILENAME">default.filter</tt> file
461 contains a selection of pre-defined filters for your convenience:</p>
463 <div class="VARIABLELIST">
465 <dt><span class="emphasis"><i class=
466 "EMPHASIS">js-annoyances</i></span></dt>
469 <p>The purpose of this filter is to get rid of particularly
470 annoying JavaScript abuse. To that end, it</p>
474 <p>replaces JavaScript references to the browser's referrer
475 information with the string "Not Your Business!". This
476 compliments the <tt class="LITERAL"><a href=
477 "actions-file.html#HIDE-REFERRER">hide-referrer</a></tt>
478 action on the content level.</p>
482 <p>removes the bindings to the DOM's <a href=
483 "http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
484 target="_top">unload event</a> which we feel has no right to
485 exist and is responsible for most <span class="QUOTE">"exit
486 consoles"</span>, i.e. nasty windows that pop up when you
487 close another one.</p>
491 <p>removes code that causes new windows to be opened with
492 undesired properties, such as being full-screen,
493 non-resizeable, without location, status or menu bar etc.</p>
497 <p>Use with caution. This is an aggressive filter, and can break
498 sites that rely heavily on JavaScript.</p>
501 <dt><span class="emphasis"><i class=
502 "EMPHASIS">js-events</i></span></dt>
505 <p>This is a very radical measure. It removes virtually all
506 JavaScript event bindings, which means that scripts can not react
507 to user actions such as mouse movements or clicks, window
508 resizing etc, anymore. Use with caution!</p>
510 <p>We <span class="emphasis"><i class="EMPHASIS">strongly
511 discourage</i></span> using this filter as a default since it
512 breaks many legitimate scripts. It is meant for use only on
513 extra-nasty sites (should you really need to go there).</p>
516 <dt><span class="emphasis"><i class=
517 "EMPHASIS">html-annoyances</i></span></dt>
520 <p>This filter will undo many common instances of HTML based
523 <p>The <tt class="LITERAL">BLINK</tt> and <tt class=
524 "LITERAL">MARQUEE</tt> tags are neutralized (yeah baby!), and
525 browser windows will be created as resizeable (as of course they
526 should be!), and will have location, scroll and menu bars -- even
527 if specified otherwise.</p>
530 <dt><span class="emphasis"><i class=
531 "EMPHASIS">content-cookies</i></span></dt>
534 <p>Most cookies are set in the HTTP dialog, where they can be
535 intercepted by the <tt class="LITERAL"><a href=
536 "actions-file.html#CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</a></tt>
537 and <tt class="LITERAL"><a href=
538 "actions-file.html#CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</a></tt>
539 actions. But web sites increasingly make use of HTML meta tags
540 and JavaScript to sneak cookies to the browser on the content
543 <p>This filter disables most HTML and JavaScript code that reads
544 or sets cookies. It cannot detect all clever uses of these types
545 of code, so it should not be relied on as an absolute fix. Use it
546 wherever you would also use the cookie crunch actions.</p>
549 <dt><span class="emphasis"><i class=
550 "EMPHASIS">refresh-tags</i></span></dt>
553 <p>Disable any refresh tags if the interval is greater than nine
554 seconds (so that redirections done via refresh tags are not
555 destroyed). This is useful for dial-on-demand setups, or for
556 those who find this HTML feature annoying.</p>
559 <dt><span class="emphasis"><i class=
560 "EMPHASIS">unsolicited-popups</i></span></dt>
563 <p>This filter attempts to prevent only <span class=
564 "QUOTE">"unsolicited"</span> pop-up windows from opening, yet
565 still allow pop-up windows that the user has explicitly chosen to
566 open. It was added in version 3.0.1, as an improvement over
567 earlier such filters.</p>
569 <p>Technical note: The filter works by redefining the window.open
570 JavaScript function to a dummy function, <tt class=
571 "LITERAL">PrivoxyWindowOpen()</tt>, during the loading and
572 rendering phase of each HTML page access, and restoring the
573 function afterward.</p>
575 <p>This is recommended only for browsers that cannot perform this
576 function reliably themselves. And be aware that some sites
577 require such windows in order to function normally. Use with
581 <dt><span class="emphasis"><i class=
582 "EMPHASIS">all-popups</i></span></dt>
585 <p>Attempt to prevent <span class="emphasis"><i class=
586 "EMPHASIS">all</i></span> pop-up windows from opening. Note this
587 should be used with even more discretion than the above, since it
588 is more likely to break some sites that require pop-ups for
589 normal usage. Use with caution.</p>
592 <dt><span class="emphasis"><i class=
593 "EMPHASIS">img-reorder</i></span></dt>
596 <p>This is a helper filter that has no value if used alone. It
597 makes the <tt class="LITERAL">banners-by-size</tt> and <tt class=
598 "LITERAL">banners-by-link</tt> (see below) filters more effective
599 and should be enabled together with them.</p>
602 <dt><span class="emphasis"><i class=
603 "EMPHASIS">banners-by-size</i></span></dt>
606 <p>This filter removes image tags purely based on what size they
607 are. Fortunately for us, many ads and banner images tend to
608 conform to certain standardized sizes, which makes this filter
609 quite effective for ad stripping purposes.</p>
611 <p>Occasionally this filter will cause false positives on images
612 that are not ads, but just happen to be of one of the standard
615 <p>Recommended only for those who require extreme ad blocking.
616 The default block rules should catch 95+% of all ads <span class=
617 "emphasis"><i class="EMPHASIS">without</i></span> this filter
621 <dt><span class="emphasis"><i class=
622 "EMPHASIS">banners-by-link</i></span></dt>
625 <p>This is an experimental filter that attempts to kill any
626 banners if their URLs seem to point to known or suspected click
627 trackers. It is currently not of much value and is not
628 recommended for use by default.</p>
631 <dt><span class="emphasis"><i class=
632 "EMPHASIS">webbugs</i></span></dt>
635 <p>Webbugs are small, invisible images (technically 1X1 GIF
636 images), that are used to track users across websites, and
637 collect information on them. As an HTML page is loaded by the
638 browser, an embedded image tag causes the browser to contact a
639 third-party site, disclosing the tracking information through the
640 requested URL and/or cookies for that third-party domain, without
641 the user ever becoming aware of the interaction with the
642 third-party site. HTML-ized spam also uses a similar technique to
643 verify email addresses.</p>
645 <p>This filter removes the HTML code that loads such <span class=
646 "QUOTE">"webbugs"</span>.</p>
649 <dt><span class="emphasis"><i class=
650 "EMPHASIS">tiny-textforms</i></span></dt>
653 <p>A rather special-purpose filter that can be used to enlarge
654 textareas (those multi-line text boxes in web forms) and turn off
655 hard word wrap in them. It was written for the sourceforge.net
656 tracker system where such boxes are a nuisance, but it can be
657 handy on other sites, too.</p>
659 <p>It is not recommended to use this filter as a default.</p>
662 <dt><span class="emphasis"><i class=
663 "EMPHASIS">jumping-windows</i></span></dt>
666 <p>Many consider windows that move, or resize themselves to be
667 abusive. This filter neutralizes the related JavaScript code.
668 Note that some sites might not display or behave as intended when
669 using this filter. Use with caution.</p>
672 <dt><span class="emphasis"><i class=
673 "EMPHASIS">frameset-borders</i></span></dt>
676 <p>Some web designers seem to assume that everyone in the world
677 will view their web sites using the same browser brand and
678 version, screen resolution etc, because only that assumption
679 could explain why they'd use static frame sizes, yet prevent
680 their frames from being resized by the user, should they be too
681 small to show their whole content.</p>
683 <p>This filter removes the related HTML code. It should only be
684 applied to sites which need it.</p>
687 <dt><span class="emphasis"><i class=
688 "EMPHASIS">demoronizer</i></span></dt>
691 <p>Many Microsoft products that generate HTML use non-standard
692 extensions (read: violations) of the ISO 8859-1 aka Latin-1
693 character set. This can cause those HTML documents to display
694 with errors on standard-compliant platforms.</p>
696 <p>This filter translates the MS-only characters into Latin-1
697 equivalents. It is not necessary when using MS products, and will
698 cause corruption of all documents that use 8-bit character sets
699 other than Latin-1. It's mostly worthwhile for Europeans on
700 non-MS platforms, if weird garbage characters sometimes appear on
701 some pages, or user agents that don't correct for this on the
705 <dt><span class="emphasis"><i class=
706 "EMPHASIS">shockwave-flash</i></span></dt>
709 <p>A filter for shockwave haters. As the name suggests, this
710 filter strips code out of web pages that is used to embed
711 shockwave flash objects.</p>
714 <dt><span class="emphasis"><i class=
715 "EMPHASIS">quicktime-kioskmode</i></span></dt>
718 <p>Change HTML code that embeds Quicktime objects so that
719 kioskmode, which prevents saving, is disabled.</p>
722 <dt><span class="emphasis"><i class="EMPHASIS">fun</i></span></dt>
725 <p>Text replacements for subversive browsing fun. Make fun of
726 your favorite Monopolist or play buzzword bingo.</p>
729 <dt><span class="emphasis"><i class=
730 "EMPHASIS">crude-parental</i></span></dt>
733 <p>A demonstration-only filter that shows how <span class=
734 "APPLICATION">Privoxy</span> can be used to delete web content on
738 <dt><span class="emphasis"><i class=
739 "EMPHASIS">ie-exploits</i></span></dt>
742 <p>An experimental collection of text replacements to disable
743 malicious HTML and JavaScript code that exploits known security
744 holes in Internet Explorer.</p>
746 <p>Presently, it only protects against Nimda and a cross-site
747 scripting bug, and would need active maintenance to provide more
748 substantial protection.</p>
751 <dt><span class="emphasis"><i class=
752 "EMPHASIS">site-specifics</i></span></dt>
755 <p>Some web sites have very specific problems, the cure for which
756 doesn't apply anywhere else, or could even cause damage on other
759 <p>This is a collection of such site-specific cures which should
760 only be applied to the sites they were intended for, which is
761 what the supplied <tt class="FILENAME">default.action</tt> file
762 does. Users shouldn't need to change anything regarding this
766 <dt><span class="emphasis"><i class=
767 "EMPHASIS">google</i></span></dt>
770 <p>A CSS based block for Google text ads. Also removes a width
771 limitation and the toolbar advertisement.</p>
774 <dt><span class="emphasis"><i class=
775 "EMPHASIS">yahoo</i></span></dt>
778 <p>Another CSS based block, this time for Yahoo text ads. And
779 removes a width limitation as well.</p>
782 <dt><span class="emphasis"><i class="EMPHASIS">msn</i></span></dt>
785 <p>Another CSS based block, this time for MSN text ads. And
786 removes tracking URLs, as well as a width limitation.</p>
789 <dt><span class="emphasis"><i class=
790 "EMPHASIS">blogspot</i></span></dt>
793 <p>Cleans up some Blogspot blogs. Read the fine print before
796 <p>This filter also intentionally removes some navigation stuff
797 and sets the page width to 100%. As a result, some rounded
798 <span class="QUOTE">"corners"</span> would appear to early or not
799 at all and as fixing this would require a browser that
800 understands background-size (CSS3), they are removed instead.</p>
803 <dt><span class="emphasis"><i class=
804 "EMPHASIS">xml-to-html</i></span></dt>
807 <p>Server-header filter to change the Content-Type from xml to
811 <dt><span class="emphasis"><i class=
812 "EMPHASIS">html-to-xml</i></span></dt>
815 <p>Server-header filter to change the Content-Type from html to
819 <dt><span class="emphasis"><i class=
820 "EMPHASIS">no-ping</i></span></dt>
823 <p>Removes the non-standard <tt class="LITERAL">ping</tt>
824 attribute from anchor and area HTML tags.</p>
827 <dt><span class="emphasis"><i class=
828 "EMPHASIS">hide-tor-exit-notation</i></span></dt>
831 <p>Client-header filter to remove the <b class="COMMAND">Tor</b>
832 exit node notation found in Host and Referer headers.</p>
834 <p>If <span class="APPLICATION">Privoxy</span> and <b class=
835 "COMMAND">Tor</b> are chained and <span class=
836 "APPLICATION">Privoxy</span> is configured to use socks4a, one
838 "QUOTE">"http://www.example.org.foobar.exit/"</span> to access
839 the host <span class="QUOTE">"www.example.org"</span> through the
840 <b class="COMMAND">Tor</b> exit node <span class=
841 "QUOTE">"foobar"</span>.</p>
843 <p>As the HTTP client isn't aware of this notation, it treats the
844 whole string <span class=
845 "QUOTE">"www.example.org.foobar.exit"</span> as host and uses it
846 for the <span class="QUOTE">"Host"</span> and <span class=
847 "QUOTE">"Referer"</span> headers. From the server's point of view
848 the resulting headers are invalid and can cause problems.</p>
850 <p>An invalid <span class="QUOTE">"Referer"</span> header can
851 trigger <span class="QUOTE">"hot-linking"</span> protections, an
852 invalid <span class="QUOTE">"Host"</span> header will make it
853 impossible for the server to find the right vhost (several
854 domains hosted on the same IP address).</p>
856 <p>This client-header filter removes the <span class=
857 "QUOTE">"foo.exit"</span> part in those headers to prevent the
858 mentioned problems. Note that it only modifies the HTTP headers,
859 it doesn't make it impossible for the server to detect your
860 <b class="COMMAND">Tor</b> exit node based on the IP address the
861 request is coming from.</p>
868 <h2 class="SECT2"><a name="EXTERNAL-FILTER-SYNTAX" id=
869 "EXTERNAL-FILTER-SYNTAX">9.3. External filter syntax</a></h2>
871 <p>External filters are scripts or programs that can modify the content
872 in case common <tt class="LITERAL"><a href=
873 "actions-file.html#FILTER">filters</a></tt> aren't powerful enough.</p>
875 <p>External filters can be written in any language the platform
876 <span class="APPLICATION">Privoxy</span> runs on supports.</p>
878 <p>They are controlled with the <tt class="LITERAL"><a href=
879 "actions-file.html#EXTERNAL-FILTER">external-filter</a></tt> action and
880 have to be defined in the <tt class="LITERAL"><a href=
881 "config.html#FILTERFILE">filterfile</a></tt> first.</p>
883 <p>The header looks like any other filter, but instead of pcrs jobs,
884 external filters contain a single job which can be a program or a shell
885 script (which may call other scripts or programs).</p>
887 <p>External filters read the content from STDIN and write the rewritten
888 content to STDOUT. The environment variables PRIVOXY_URL, PRIVOXY_PATH,
889 PRIVOXY_HOST, PRIVOXY_ORIGIN can be used to get some details about the
892 <p><span class="APPLICATION">Privoxy</span> will temporary store the
893 content to filter in the <tt class="LITERAL"><a href=
894 "config.html#TEMPORARY-DIRECTORY">temporary-directory</a></tt>.</p>
896 <table border="0" bgcolor="#E0E0E0" width="100%">
900 EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content
903 # Incorrect reimplementation of the filter above in POSIX shell.
905 # Note that it's a single job that spans multiple lines, the line
906 # breaks are not passed to the shell, thus the semicolons are required.
908 # If the script isn't trivial, it is recommended to put it into an external file.
910 # In general, writing external filters entirely in POSIX shell is not
911 # considered a good idea.
912 EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content
918 EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value.
919 /usr/local/bin/convert - -rotate 180 -
921 EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment.
922 /usr/local/bin/convert - -pointsize 16 -fill white -annotate +17+418 "[citation needed]" -
928 <div class="WARNING">
929 <table class="WARNING" border="1" width="100%">
931 <td align="center"><b>Warning</b></td>
936 <p>Currently external filters are executed with <span class=
937 "APPLICATION">Privoxy</span>'s privileges! Only use external
938 filters you understand and trust.</p>
944 <p>External filters are experimental and the syntax may change in the
949 <div class="NAVFOOTER">
950 <hr align="left" width="100%">
952 <table summary="Footer navigation table" width="100%" border="0"
953 cellpadding="0" cellspacing="0">
955 <td width="33%" align="left" valign="top"><a href="actions-file.html"
956 accesskey="P">Prev</a></td>
958 <td width="34%" align="center" valign="top"><a href="index.html"
959 accesskey="H">Home</a></td>
961 <td width="33%" align="right" valign="top"><a href="templates.html"
962 accesskey="N">Next</a></td>
966 <td width="33%" align="left" valign="top">Actions Files</td>
968 <td width="34%" align="center" valign="top"> </td>
970 <td width="33%" align="right" valign="top">Privoxy's Template