1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
2 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
8 <meta name="GENERATOR" content=
9 "Modular DocBook HTML Stylesheet Version 1.79">
10 <link rel="HOME" title="Privoxy 3.0.26 User Manual" href="index.html">
11 <link rel="PREVIOUS" title="Actions Files" href="actions-file.html">
12 <link rel="NEXT" title="Privoxy's Template Files" href="templates.html">
13 <link rel="STYLESHEET" type="text/css" href="../p_doc.css">
14 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
15 <link rel="STYLESHEET" type="text/css" href="p_doc.css">
17 <body class="SECT1" bgcolor="#EEEEEE" text="#000000" link="#0000FF" vlink=
18 "#840084" alink="#0000FF">
19 <div class="NAVHEADER">
20 <table summary="Header navigation table" width="100%" border="0"
21 cellpadding="0" cellspacing="0">
23 <th colspan="3" align="center">
24 Privoxy 3.0.26 User Manual
28 <td width="10%" align="left" valign="bottom">
29 <a href="actions-file.html" accesskey="P">Prev</a>
31 <td width="80%" align="center" valign="bottom">
33 <td width="10%" align="right" valign="bottom">
34 <a href="templates.html" accesskey="N">Next</a>
38 <hr align="LEFT" width="100%">
42 <a name="FILTER-FILE">9. Filter Files</a>
45 On-the-fly text substitutions need to be defined in a <span class=
46 "QUOTE">"filter file"</span>. Once defined, they can then be invoked
47 as an <span class="QUOTE">"action"</span>.
50 <span class="APPLICATION">Privoxy</span> supports three different
51 pcrs-based filter actions: <tt class="LITERAL"><a href=
52 "actions-file.html#FILTER">filter</a></tt> to rewrite the content
53 that is send to the client, <tt class="LITERAL"><a href=
54 "actions-file.html#CLIENT-HEADER-FILTER">client-header-filter</a></tt>
55 to rewrite headers that are send by the client, and <tt class=
57 "actions-file.html#SERVER-HEADER-FILTER">server-header-filter</a></tt>
58 to rewrite headers that are send by the server.
61 <span class="APPLICATION">Privoxy</span> also supports two tagger
62 actions: <tt class="LITERAL"><a href=
63 "actions-file.html#CLIENT-HEADER-TAGGER">client-header-tagger</a></tt>
64 and <tt class="LITERAL"><a href=
65 "actions-file.html#SERVER-HEADER-TAGGER">server-header-tagger</a></tt>.
66 Taggers and filters use the same syntax in the filter files, the
67 difference is that taggers don't modify the text they are filtering,
68 but use a rewritten version of the filtered text as tag. The tags can
69 then be used to change the applying actions through sections with <a
70 href="actions-file.html#TAG-PATTERN">tag-patterns</a>.
73 Finally <span class="APPLICATION">Privoxy</span> supports the <tt
74 class="LITERAL"><a href=
75 "actions-file.html#EXTERNAL-FILTER">external-filter</a></tt> action
76 to enable <tt class="LITERAL"><a href=
77 "filter-file.html#EXTERNAL-FILTER-SYNTAX">external filters</a></tt>
78 written in proper programming languages.
81 Multiple filter files can be defined through the <tt class=
82 "LITERAL"><a href="config.html#FILTERFILE">filterfile</a></tt> config
83 directive. The filters as supplied by the developers are located in
84 <tt class="FILENAME">default.filter</tt>. It is recommended that any
85 locally defined or modified filters go in a separately defined file
86 such as <tt class="FILENAME">user.filter</tt>.
89 Common tasks for content filters are to eliminate common annoyances
90 in HTML and JavaScript, such as pop-up windows, exit consoles,
91 crippled windows without navigation tools, the infamous <BLINK>
92 tag etc, to suppress images with certain width and height attributes
93 (standard banner sizes or web-bugs), or just to have fun.
96 Enabled content filters are applied to any content whose <span class=
97 "QUOTE">"Content Type"</span> header is recognised as a sign of
98 text-based content, with the exception of <tt class=
99 "LITERAL">text/plain</tt>. Use the <a href=
100 "actions-file.html#FORCE-TEXT-MODE">force-text-mode</a> action to
101 also filter other content.
104 Substitutions are made at the source level, so if you want to <span
105 class="QUOTE">"roll your own"</span> filters, you should first be
106 familiar with HTML syntax, and, of course, regular expressions.
109 Just like the <a href="actions-file.html">actions files</a>, the
110 filter file is organized in sections, which are called <span class=
111 "emphasis"><i class="EMPHASIS">filters</i></span> here. Each filter
112 consists of a heading line, that starts with one of the <span class=
113 "emphasis"><i class="EMPHASIS">keywords</i></span> <tt class=
114 "LITERAL">FILTER:</tt>, <tt class=
115 "LITERAL">CLIENT-HEADER-FILTER:</tt> or <tt class=
116 "LITERAL">SERVER-HEADER-FILTER:</tt> followed by the filter's <span
117 class="emphasis"><i class="EMPHASIS">name</i></span>, and a short
118 (one line) <span class="emphasis"><i class=
119 "EMPHASIS">description</i></span> of what it does. Below that line
120 come the <span class="emphasis"><i class="EMPHASIS">jobs</i></span>,
121 i.e. lines that define the actual text substitutions. By convention,
122 the name of a filter should describe what the filter <span class=
123 "emphasis"><i class="EMPHASIS">eliminates</i></span>. The comment is
124 used in the <a href="http://config.privoxy.org/" target=
125 "_top">web-based user interface</a>.
128 Once a filter called <tt class="REPLACEABLE"><i>name</i></tt> has
129 been defined in the filter file, it can be invoked by using an action
130 of the form +<tt class="LITERAL"><a href=
131 "actions-file.html#FILTER">filter</a>{<tt class=
132 "REPLACEABLE"><i>name</i></tt>}</tt> in any <a href=
133 "actions-file.html">actions file</a>.
136 Filter definitions start with a header line that contains the filter
137 type, the filter name and the filter description. A content filter
138 header line for a filter called <span class="QUOTE">"foo"</span>
139 could look like this:
143 <table border="0" bgcolor="#E0E0E0" width="100%">
147 FILTER: foo Replace all "foo" with "bar"
154 Below that line, and up to the next header line, come the jobs that
155 define what text replacements the filter executes. They are specified
156 in a syntax that imitates <a href="http://www.perl.org/" target=
157 "_top">Perl</a>'s <tt class="LITERAL">s///</tt> operator. If you are
158 familiar with Perl, you will find this to be quite intuitive, and may
159 want to look at the PCRS documentation for the subtle differences to
163 Most notably, the non-standard option letter <tt class=
164 "LITERAL">U</tt> is supported, which turns the default to ungreedy
165 matching (add <tt class="LITERAL">?</tt> to quantifiers to turn them
169 The non-standard option letter <tt class="LITERAL">D</tt> (dynamic)
170 allows to use the variables $host, $origin (the IP address the
171 request came from), $path, $url and $listen-address (the address on
172 which Privoxy accepted the client request. Example: 127.0.0.1:8118).
173 They will be replaced with the value they refer to before the filter
177 Note that '$' is a bad choice for a delimiter in a dynamic filter as
178 you might end up with unintended variables if you use a variable name
179 directly after the delimiter. Variables will be resolved without
180 escaping anything, therefore you also have to be careful not to chose
181 delimiters that appear in the replacement text. For example '<'
182 should be save, while '?' will sooner or later cause conflicts with
186 The non-standard option letter <tt class="LITERAL">T</tt> (trivial)
187 prevents parsing for backreferences in the substitute. Use it if you
188 want to include text like '$&' in your substitute without
192 If you are new to <a href=
193 "http://en.wikipedia.org/wiki/Regular_expressions" target=
194 "_top"><span class="QUOTE">"Regular Expressions"</span></a>, you
195 might want to take a look at the <a href=
196 "appendix.html#REGEX">Appendix on regular expressions</a>, and see
197 the <a href="http://perldoc.perl.org/perlre.html" target="_top">Perl
198 manual</a> for <a href="http://perldoc.perl.org/perlop.html" target=
199 "_top">the <tt class="LITERAL">s///</tt> operator's syntax</a> and <a
200 href="http://perldoc.perl.org/perlre.html" target="_top">Perl-style
201 regular expressions</a> in general. The below examples might also
202 help to get you started.
206 <a name="FILTER-FILE-TUT">9.1. Filter File Tutorial</a>
209 Now, let's complete our <span class="QUOTE">"foo"</span> content
210 filter. We have already defined the heading, but the jobs are still
211 missing. Since all it does is to replace <span class=
212 "QUOTE">"foo"</span> with <span class="QUOTE">"bar"</span>, there
213 is only one (trivial) job needed:
217 <table border="0" bgcolor="#E0E0E0" width="100%">
228 But wait! Didn't the comment say that <span class="emphasis"><i
229 class="EMPHASIS">all</i></span> occurrences of <span class=
230 "QUOTE">"foo"</span> should be replaced? Our current job will only
231 take care of the first <span class="QUOTE">"foo"</span> on each
232 page. For global substitution, we'll need to add the <tt class=
233 "LITERAL">g</tt> option:
237 <table border="0" bgcolor="#E0E0E0" width="100%">
248 Our complete filter now looks like this:
252 <table border="0" bgcolor="#E0E0E0" width="100%">
256 FILTER: foo Replace all "foo" with "bar"
264 Let's look at some real filters for more interesting examples. Here
265 you see a filter that protects against some common annoyances that
266 arise from JavaScript abuse. Let's look at its jobs one after the
271 <table border="0" bgcolor="#E0E0E0" width="100%">
275 FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
277 # Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
279 s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
286 Following the header line and a comment, you see the job. Note that
287 it uses <tt class="LITERAL">|</tt> as the delimiter instead of <tt
288 class="LITERAL">/</tt>, because the pattern contains a forward
289 slash, which would otherwise have to be escaped by a backslash (<tt
290 class="LITERAL">\</tt>).
293 Now, let's examine the pattern: it starts with the text <tt class=
294 "LITERAL"><script.*</tt> enclosed in parentheses. Since the dot
295 matches any character, and <tt class="LITERAL">*</tt> means: <span
296 class="QUOTE">"Match an arbitrary number of the element left of
297 myself"</span>, this matches <span class=
298 "QUOTE">"<script"</span>, followed by <span class="emphasis"><i
299 class="EMPHASIS">any</i></span> text, i.e. it matches the whole
300 page, from the start of the first <script> tag.
303 That's more than we want, but the pattern continues: <tt class=
304 "LITERAL">document\.referrer</tt> matches only the exact string
305 <span class="QUOTE">"document.referrer"</span>. The dot needed to
306 be <span class="emphasis"><i class="EMPHASIS">escaped</i></span>,
307 i.e. preceded by a backslash, to take away its special meaning as a
308 joker, and make it just a regular dot. So far, the meaning is:
309 Match from the start of the first <script> tag in a the page,
310 up to, and including, the text <span class=
311 "QUOTE">"document.referrer"</span>, if <span class="emphasis"><i
312 class="EMPHASIS">both</i></span> are present in the page (and
313 appear in that order).
316 But there's still more pattern to go. The next element, again
317 enclosed in parentheses, is <tt class=
318 "LITERAL">.*</script></tt>. You already know what <tt class=
319 "LITERAL">.*</tt> means, so the whole pattern translates to: Match
320 from the start of the first <script> tag in a page to the end
321 of the last <script> tag, provided that the text <span class=
322 "QUOTE">"document.referrer"</span> appears somewhere in between.
325 This is still not the whole story, since we have ignored the
326 options and the parentheses: The portions of the page matched by
327 sub-patterns that are enclosed in parentheses, will be remembered
328 and be available through the variables <tt class="LITERAL">$1, $2,
329 ...</tt> in the substitute. The <tt class="LITERAL">U</tt> option
330 switches to ungreedy matching, which means that the first <tt
331 class="LITERAL">.*</tt> in the pattern will only <span class=
332 "QUOTE">"eat up"</span> all text in between <span class=
333 "QUOTE">"<script"</span> and the <span class="emphasis"><i
334 class="EMPHASIS">first</i></span> occurrence of <span class=
335 "QUOTE">"document.referrer"</span>, and that the second <tt class=
336 "LITERAL">.*</tt> will only span the text up to the <span class=
337 "emphasis"><i class="EMPHASIS">first</i></span> <span class=
338 "QUOTE">"</script>"</span> tag. Furthermore, the <tt class=
339 "LITERAL">s</tt> option says that the match may span multiple lines
340 in the page, and the <tt class="LITERAL">g</tt> option again means
341 that the substitution is global.
344 So, to summarize, the pattern means: Match all scripts that contain
345 the text <span class="QUOTE">"document.referrer"</span>. Remember
346 the parts of the script from (and including) the start tag up to
347 (and excluding) the string <span class=
348 "QUOTE">"document.referrer"</span> as <tt class="LITERAL">$1</tt>,
349 and the part following that string, up to and including the closing
350 tag, as <tt class="LITERAL">$2</tt>.
353 Now the pattern is deciphered, but wasn't this about substituting
354 things? So lets look at the substitute: <tt class="LITERAL">$1"Not
355 Your Business!"$2</tt> is easy to read: The text remembered as <tt
356 class="LITERAL">$1</tt>, followed by <tt class="LITERAL">"Not Your
357 Business!"</tt> (<span class="emphasis"><i class=
358 "EMPHASIS">including</i></span> the quotation marks!), followed by
359 the text remembered as <tt class="LITERAL">$2</tt>. This produces
360 an exact copy of the original string, with the middle part (the
361 <span class="QUOTE">"document.referrer"</span>) replaced by <tt
362 class="LITERAL">"Not Your Business!"</tt>.
365 The whole job now reads: Replace <span class=
366 "QUOTE">"document.referrer"</span> by <tt class="LITERAL">"Not Your
367 Business!"</tt> wherever it appears inside a <script> tag.
368 Note that this job won't break JavaScript syntax, since both the
369 original and the replacement are syntactically valid string
370 objects. The script just won't have access to the referrer
374 We'll show you two other jobs from the JavaScript taming
375 department, but this time only point out the constructs of special
380 <table border="0" bgcolor="#E0E0E0" width="100%">
384 # The status bar is for displaying link targets, not pointless blahblah
386 s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
393 <tt class="LITERAL">\s</tt> stands for whitespace characters
394 (space, tab, newline, carriage return, form feed), so that <tt
395 class="LITERAL">\s*</tt> means: <span class="QUOTE">"zero or more
396 whitespace"</span>. The <tt class="LITERAL">?</tt> in <tt class=
397 "LITERAL">.*?</tt> makes this matching of arbitrary text ungreedy.
398 (Note that the <tt class="LITERAL">U</tt> option is not set). The
399 <tt class="LITERAL">['"]</tt> construct means: <span class=
400 "QUOTE">"a single <span class="emphasis"><i class=
401 "EMPHASIS">or</i></span> a double quote"</span>. Finally, <tt
402 class="LITERAL">\1</tt> is a back-reference to the first
403 parenthesis just like <tt class="LITERAL">$1</tt> above, with the
404 difference that in the <span class="emphasis"><i class=
405 "EMPHASIS">pattern</i></span>, a backslash indicates a
406 back-reference, whereas in the <span class="emphasis"><i class=
407 "EMPHASIS">substitute</i></span>, it's the dollar.
410 So what does this job do? It replaces assignments of single- or
411 double-quoted strings to the <span class=
412 "QUOTE">"window.status"</span> object with a dummy assignment
413 (using a variable name that is hopefully odd enough not to conflict
414 with real variables in scripts). Thus, it catches many cases where
415 e.g. pointless descriptions are displayed in the status bar instead
416 of the link target when you move your mouse over links.
420 <table border="0" bgcolor="#E0E0E0" width="100%">
424 # Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
426 s/(<body [^>]*)onunload(.*>)/$1never$2/iU
433 Including the <a href=
434 "http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
435 target="_top">OnUnload event binding</a> in the HTML DOM was a
436 <span class="emphasis"><i class="EMPHASIS">CRIME</i></span>. When I
437 close a browser window, I want it to close and die. Basta. This job
438 replaces the <span class="QUOTE">"onunload"</span> attribute in
439 <span class="QUOTE">"<body>"</span> tags with the dummy word
440 <tt class="LITERAL">never</tt>. Note that the <tt class=
441 "LITERAL">i</tt> option makes the pattern matching
442 case-insensitive. Also note that ungreedy matching alone doesn't
443 always guarantee a minimal match: In the first parenthesis, we had
444 to use <tt class="LITERAL">[^>]*</tt> instead of <tt class=
445 "LITERAL">.*</tt> to prevent the match from exceeding the
446 <body> tag if it doesn't contain <span class=
447 "QUOTE">"OnUnload"</span>, but the page's content does.
450 The last example is from the fun department:
454 <table border="0" bgcolor="#E0E0E0" width="100%">
458 FILTER: fun Fun text replacements
460 # Spice the daily news:
462 s/microsoft(?!\.com)/MicroSuck/ig
469 Note the <tt class="LITERAL">(?!\.com)</tt> part (a so-called
470 negative lookahead) in the job's pattern, which means: Don't match,
471 if the string <span class="QUOTE">".com"</span> appears directly
472 following <span class="QUOTE">"microsoft"</span> in the page. This
473 prevents links to microsoft.com from being trashed, while still
474 replacing the word everywhere else.
478 <table border="0" bgcolor="#E0E0E0" width="100%">
482 # Buzzword Bingo (example for extended regex syntax)
484 s* industry[ -]leading \
486 | customer[ -]focused \
488 | award[ -]winning # Comments are OK, too! \
489 | high[ -]performance \
490 | solutions[ -]based \
494 *<font color="red"><b>BINGO!</b></font> \
502 The <tt class="LITERAL">x</tt> option in this job turns on extended
503 syntax, and allows for e.g. the liberal use of (non-interpreted!)
504 whitespace for nicer formatting.
512 <a name="PREDEFINED-FILTERS">9.2. The Pre-defined Filters</a>
515 The distribution <tt class="FILENAME">default.filter</tt> file
516 contains a selection of pre-defined filters for your convenience:
518 <div class="VARIABLELIST">
521 <span class="emphasis"><i class=
522 "EMPHASIS">js-annoyances</i></span>
526 The purpose of this filter is to get rid of particularly
527 annoying JavaScript abuse. To that end, it
532 replaces JavaScript references to the browser's referrer
533 information with the string "Not Your Business!". This
534 compliments the <tt class="LITERAL"><a href=
535 "actions-file.html#HIDE-REFERRER">hide-referrer</a></tt>
536 action on the content level.
541 removes the bindings to the DOM's <a href=
542 "http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
543 target="_top">unload event</a> which we feel has no
544 right to exist and is responsible for most <span class=
545 "QUOTE">"exit consoles"</span>, i.e. nasty windows that
546 pop up when you close another one.
551 removes code that causes new windows to be opened with
552 undesired properties, such as being full-screen,
553 non-resizeable, without location, status or menu bar etc.
559 Use with caution. This is an aggressive filter, and can break
560 sites that rely heavily on JavaScript.
564 <span class="emphasis"><i class="EMPHASIS">js-events</i></span>
568 This is a very radical measure. It removes virtually all
569 JavaScript event bindings, which means that scripts can not
570 react to user actions such as mouse movements or clicks,
571 window resizing etc, anymore. Use with caution!
574 We <span class="emphasis"><i class="EMPHASIS">strongly
575 discourage</i></span> using this filter as a default since it
576 breaks many legitimate scripts. It is meant for use only on
577 extra-nasty sites (should you really need to go there).
581 <span class="emphasis"><i class=
582 "EMPHASIS">html-annoyances</i></span>
586 This filter will undo many common instances of HTML based
590 The <tt class="LITERAL">BLINK</tt> and <tt class=
591 "LITERAL">MARQUEE</tt> tags are neutralized (yeah baby!), and
592 browser windows will be created as resizeable (as of course
593 they should be!), and will have location, scroll and menu
594 bars -- even if specified otherwise.
598 <span class="emphasis"><i class=
599 "EMPHASIS">content-cookies</i></span>
603 Most cookies are set in the HTTP dialog, where they can be
604 intercepted by the <tt class="LITERAL"><a href=
605 "actions-file.html#CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</a></tt>
606 and <tt class="LITERAL"><a href=
607 "actions-file.html#CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</a></tt>
608 actions. But web sites increasingly make use of HTML meta
609 tags and JavaScript to sneak cookies to the browser on the
613 This filter disables most HTML and JavaScript code that reads
614 or sets cookies. It cannot detect all clever uses of these
615 types of code, so it should not be relied on as an absolute
616 fix. Use it wherever you would also use the cookie crunch
621 <span class="emphasis"><i class=
622 "EMPHASIS">refresh-tags</i></span>
626 Disable any refresh tags if the interval is greater than nine
627 seconds (so that redirections done via refresh tags are not
628 destroyed). This is useful for dial-on-demand setups, or for
629 those who find this HTML feature annoying.
633 <span class="emphasis"><i class=
634 "EMPHASIS">unsolicited-popups</i></span>
638 This filter attempts to prevent only <span class=
639 "QUOTE">"unsolicited"</span> pop-up windows from opening, yet
640 still allow pop-up windows that the user has explicitly
641 chosen to open. It was added in version 3.0.1, as an
642 improvement over earlier such filters.
645 Technical note: The filter works by redefining the
646 window.open JavaScript function to a dummy function, <tt
647 class="LITERAL">PrivoxyWindowOpen()</tt>, during the loading
648 and rendering phase of each HTML page access, and restoring
649 the function afterward.
652 This is recommended only for browsers that cannot perform
653 this function reliably themselves. And be aware that some
654 sites require such windows in order to function normally. Use
659 <span class="emphasis"><i class=
660 "EMPHASIS">all-popups</i></span>
664 Attempt to prevent <span class="emphasis"><i class=
665 "EMPHASIS">all</i></span> pop-up windows from opening. Note
666 this should be used with even more discretion than the above,
667 since it is more likely to break some sites that require
668 pop-ups for normal usage. Use with caution.
672 <span class="emphasis"><i class=
673 "EMPHASIS">img-reorder</i></span>
677 This is a helper filter that has no value if used alone. It
678 makes the <tt class="LITERAL">banners-by-size</tt> and <tt
679 class="LITERAL">banners-by-link</tt> (see below) filters more
680 effective and should be enabled together with them.
684 <span class="emphasis"><i class=
685 "EMPHASIS">banners-by-size</i></span>
689 This filter removes image tags purely based on what size they
690 are. Fortunately for us, many ads and banner images tend to
691 conform to certain standardized sizes, which makes this
692 filter quite effective for ad stripping purposes.
695 Occasionally this filter will cause false positives on images
696 that are not ads, but just happen to be of one of the
697 standard banner sizes.
700 Recommended only for those who require extreme ad blocking.
701 The default block rules should catch 95+% of all ads <span
702 class="emphasis"><i class="EMPHASIS">without</i></span> this
707 <span class="emphasis"><i class=
708 "EMPHASIS">banners-by-link</i></span>
712 This is an experimental filter that attempts to kill any
713 banners if their URLs seem to point to known or suspected
714 click trackers. It is currently not of much value and is not
715 recommended for use by default.
719 <span class="emphasis"><i class="EMPHASIS">webbugs</i></span>
723 Webbugs are small, invisible images (technically 1X1 GIF
724 images), that are used to track users across websites, and
725 collect information on them. As an HTML page is loaded by the
726 browser, an embedded image tag causes the browser to contact
727 a third-party site, disclosing the tracking information
728 through the requested URL and/or cookies for that third-party
729 domain, without the user ever becoming aware of the
730 interaction with the third-party site. HTML-ized spam also
731 uses a similar technique to verify email addresses.
734 This filter removes the HTML code that loads such <span
735 class="QUOTE">"webbugs"</span>.
739 <span class="emphasis"><i class=
740 "EMPHASIS">tiny-textforms</i></span>
744 A rather special-purpose filter that can be used to enlarge
745 textareas (those multi-line text boxes in web forms) and turn
746 off hard word wrap in them. It was written for the
747 sourceforge.net tracker system where such boxes are a
748 nuisance, but it can be handy on other sites, too.
751 It is not recommended to use this filter as a default.
755 <span class="emphasis"><i class=
756 "EMPHASIS">jumping-windows</i></span>
760 Many consider windows that move, or resize themselves to be
761 abusive. This filter neutralizes the related JavaScript code.
762 Note that some sites might not display or behave as intended
763 when using this filter. Use with caution.
767 <span class="emphasis"><i class=
768 "EMPHASIS">frameset-borders</i></span>
772 Some web designers seem to assume that everyone in the world
773 will view their web sites using the same browser brand and
774 version, screen resolution etc, because only that assumption
775 could explain why they'd use static frame sizes, yet prevent
776 their frames from being resized by the user, should they be
777 too small to show their whole content.
780 This filter removes the related HTML code. It should only be
781 applied to sites which need it.
785 <span class="emphasis"><i class=
786 "EMPHASIS">demoronizer</i></span>
790 Many Microsoft products that generate HTML use non-standard
791 extensions (read: violations) of the ISO 8859-1 aka Latin-1
792 character set. This can cause those HTML documents to display
793 with errors on standard-compliant platforms.
796 This filter translates the MS-only characters into Latin-1
797 equivalents. It is not necessary when using MS products, and
798 will cause corruption of all documents that use 8-bit
799 character sets other than Latin-1. It's mostly worthwhile for
800 Europeans on non-MS platforms, if weird garbage characters
801 sometimes appear on some pages, or user agents that don't
802 correct for this on the fly.
806 <span class="emphasis"><i class=
807 "EMPHASIS">shockwave-flash</i></span>
811 A filter for shockwave haters. As the name suggests, this
812 filter strips code out of web pages that is used to embed
813 shockwave flash objects.
819 <span class="emphasis"><i class=
820 "EMPHASIS">quicktime-kioskmode</i></span>
824 Change HTML code that embeds Quicktime objects so that
825 kioskmode, which prevents saving, is disabled.
829 <span class="emphasis"><i class="EMPHASIS">fun</i></span>
833 Text replacements for subversive browsing fun. Make fun of
834 your favorite Monopolist or play buzzword bingo.
838 <span class="emphasis"><i class=
839 "EMPHASIS">crude-parental</i></span>
843 A demonstration-only filter that shows how <span class=
844 "APPLICATION">Privoxy</span> can be used to delete web
845 content on a keyword basis.
849 <span class="emphasis"><i class=
850 "EMPHASIS">ie-exploits</i></span>
854 An experimental collection of text replacements to disable
855 malicious HTML and JavaScript code that exploits known
856 security holes in Internet Explorer.
859 Presently, it only protects against Nimda and a cross-site
860 scripting bug, and would need active maintenance to provide
861 more substantial protection.
865 <span class="emphasis"><i class=
866 "EMPHASIS">site-specifics</i></span>
870 Some web sites have very specific problems, the cure for
871 which doesn't apply anywhere else, or could even cause damage
875 This is a collection of such site-specific cures which should
876 only be applied to the sites they were intended for, which is
877 what the supplied <tt class="FILENAME">default.action</tt>
878 file does. Users shouldn't need to change anything regarding
883 <span class="emphasis"><i class="EMPHASIS">google</i></span>
887 A CSS based block for Google text ads. Also removes a width
888 limitation and the toolbar advertisement.
892 <span class="emphasis"><i class="EMPHASIS">yahoo</i></span>
896 Another CSS based block, this time for Yahoo text ads. And
897 removes a width limitation as well.
901 <span class="emphasis"><i class="EMPHASIS">msn</i></span>
905 Another CSS based block, this time for MSN text ads. And
906 removes tracking URLs, as well as a width limitation.
910 <span class="emphasis"><i class="EMPHASIS">blogspot</i></span>
914 Cleans up some Blogspot blogs. Read the fine print before
918 This filter also intentionally removes some navigation stuff
919 and sets the page width to 100%. As a result, some rounded
920 <span class="QUOTE">"corners"</span> would appear to early or
921 not at all and as fixing this would require a browser that
922 understands background-size (CSS3), they are removed instead.
926 <span class="emphasis"><i class=
927 "EMPHASIS">xml-to-html</i></span>
931 Server-header filter to change the Content-Type from xml to
936 <span class="emphasis"><i class=
937 "EMPHASIS">html-to-xml</i></span>
941 Server-header filter to change the Content-Type from html to
946 <span class="emphasis"><i class="EMPHASIS">no-ping</i></span>
950 Removes the non-standard <tt class="LITERAL">ping</tt>
951 attribute from anchor and area HTML tags.
955 <span class="emphasis"><i class=
956 "EMPHASIS">hide-tor-exit-notation</i></span>
960 Client-header filter to remove the <b class="COMMAND">Tor</b>
961 exit node notation found in Host and Referer headers.
964 If <span class="APPLICATION">Privoxy</span> and <b class=
965 "COMMAND">Tor</b> are chained and <span class=
966 "APPLICATION">Privoxy</span> is configured to use socks4a,
967 one can use <span class=
968 "QUOTE">"http://www.example.org.foobar.exit/"</span> to
969 access the host <span class="QUOTE">"www.example.org"</span>
970 through the <b class="COMMAND">Tor</b> exit node <span class=
971 "QUOTE">"foobar"</span>.
974 As the HTTP client isn't aware of this notation, it treats
975 the whole string <span class=
976 "QUOTE">"www.example.org.foobar.exit"</span> as host and uses
977 it for the <span class="QUOTE">"Host"</span> and <span class=
978 "QUOTE">"Referer"</span> headers. From the server's point of
979 view the resulting headers are invalid and can cause
983 An invalid <span class="QUOTE">"Referer"</span> header can
984 trigger <span class="QUOTE">"hot-linking"</span> protections,
985 an invalid <span class="QUOTE">"Host"</span> header will make
986 it impossible for the server to find the right vhost (several
987 domains hosted on the same IP address).
990 This client-header filter removes the <span class=
991 "QUOTE">"foo.exit"</span> part in those headers to prevent
992 the mentioned problems. Note that it only modifies the HTTP
993 headers, it doesn't make it impossible for the server to
994 detect your <b class="COMMAND">Tor</b> exit node based on the
995 IP address the request is coming from.
1003 <a name="EXTERNAL-FILTER-SYNTAX">9.3. External filter syntax</a>
1006 External filters are scripts or programs that can modify the
1007 content in case common <tt class="LITERAL"><a href=
1008 "actions-file.html#FILTER">filters</a></tt> aren't powerful enough.
1011 External filters can be written in any language the platform <span
1012 class="APPLICATION">Privoxy</span> runs on supports.
1015 They are controlled with the <tt class="LITERAL"><a href=
1016 "actions-file.html#EXTERNAL-FILTER">external-filter</a></tt> action
1017 and have to be defined in the <tt class="LITERAL"><a href=
1018 "config.html#FILTERFILE">filterfile</a></tt> first.
1021 The header looks like any other filter, but instead of pcrs jobs,
1022 external filters contain a single job which can be a program or a
1023 shell script (which may call other scripts or programs).
1026 External filters read the content from STDIN and write the
1027 rewritten content to STDOUT. The environment variables PRIVOXY_URL,
1028 PRIVOXY_PATH, PRIVOXY_HOST, PRIVOXY_ORIGIN, PRIVOXY_LISTEN_ADDRESS
1029 can be used to get some details about the client request.
1032 <span class="APPLICATION">Privoxy</span> will temporary store the
1033 content to filter in the <tt class="LITERAL"><a href=
1034 "config.html#TEMPORARY-DIRECTORY">temporary-directory</a></tt>.
1038 <table border="0" bgcolor="#E0E0E0" width="100%">
1041 <pre class="SCREEN">
1042 EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content
1045 # Incorrect reimplementation of the filter above in POSIX shell.
1047 # Note that it's a single job that spans multiple lines, the line
1048 # breaks are not passed to the shell, thus the semicolons are required.
1050 # If the script isn't trivial, it is recommended to put it into an external file.
1052 # In general, writing external filters entirely in POSIX shell is not
1053 # considered a good idea.
1054 EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content
1060 EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value.
1061 /usr/local/bin/convert - -rotate 180 -
1063 EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment.
1064 /usr/local/bin/convert - -pointsize 16 -fill white -annotate +17+418 "[citation needed]" -
1070 <div class="WARNING">
1071 <table class="WARNING" border="1" width="100%">
1080 Currently external filters are executed with <span class=
1081 "APPLICATION">Privoxy</span>'s privileges! Only use
1082 external filters you understand and trust.
1089 External filters are experimental and the syntax may change in the
1094 <div class="NAVFOOTER">
1095 <hr align="LEFT" width="100%">
1096 <table summary="Footer navigation table" width="100%" border="0"
1097 cellpadding="0" cellspacing="0">
1099 <td width="33%" align="left" valign="top">
1100 <a href="actions-file.html" accesskey="P">Prev</a>
1102 <td width="34%" align="center" valign="top">
1103 <a href="index.html" accesskey="H">Home</a>
1105 <td width="33%" align="right" valign="top">
1106 <a href="templates.html" accesskey="N">Next</a>
1110 <td width="33%" align="left" valign="top">
1113 <td width="34%" align="center" valign="top">
1116 <td width="33%" align="right" valign="top">
1117 Privoxy's Template Files