1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
8 CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
10 TITLE="Privoxy 3.0.27 User Manual"
11 HREF="index.html"><LINK
14 HREF="actions-file.html"><LINK
16 TITLE="Privoxy's Template Files"
17 HREF="templates.html"><LINK
20 HREF="../p_doc.css"><META
21 HTTP-EQUIV="Content-Type"
24 <LINK REL="STYLESHEET" TYPE="text/css" HREF="p_doc.css">
36 SUMMARY="Header navigation table"
45 >Privoxy 3.0.27 User Manual</TH
53 HREF="actions-file.html"
85 > On-the-fly text substitutions need
86 to be defined in a <SPAN
90 can then be invoked as an <SPAN
98 > supports three different pcrs-based filter actions:
102 HREF="actions-file.html#FILTER"
106 rewrite the content that is send to the client,
110 HREF="actions-file.html#CLIENT-HEADER-FILTER"
111 >client-header-filter</A
114 to rewrite headers that are send by the client, and
118 HREF="actions-file.html#SERVER-HEADER-FILTER"
119 >server-header-filter</A
122 to rewrite headers that are send by the server.</P
127 > also supports two tagger actions:
131 HREF="actions-file.html#CLIENT-HEADER-TAGGER"
132 >client-header-tagger</A
139 HREF="actions-file.html#SERVER-HEADER-TAGGER"
140 >server-header-tagger</A
143 Taggers and filters use the same syntax in the filter files, the difference
144 is that taggers don't modify the text they are filtering, but use a rewritten
145 version of the filtered text as tag. The tags can then be used to change the
146 applying actions through sections with <A
147 HREF="actions-file.html#TAG-PATTERN"
158 HREF="actions-file.html#EXTERNAL-FILTER"
165 HREF="filter-file.html#EXTERNAL-FILTER-SYNTAX"
169 written in proper programming languages.</P
171 > Multiple filter files can be defined through the <TT
174 HREF="config.html#FILTERFILE"
177 > config directive. The filters
178 as supplied by the developers are located in
182 >. It is recommended that any locally
183 defined or modified filters go in a separately defined file such as
190 > Common tasks for content filters are to eliminate common annoyances in
191 HTML and JavaScript, such as pop-up windows,
192 exit consoles, crippled windows without navigation tools, the
193 infamous <BLINK> tag etc, to suppress images with certain
194 width and height attributes (standard banner sizes or web-bugs),
195 or just to have fun.</P
197 > Enabled content filters are applied to any content whose
200 >"Content Type"</SPAN
201 > header is recognised as a sign
202 of text-based content, with the exception of <TT
207 HREF="actions-file.html#FORCE-TEXT-MODE"
210 to also filter other content.</P
212 > Substitutions are made at the source level, so if you want to <SPAN
216 > filters, you should first be familiar with HTML syntax,
217 and, of course, regular expressions.</P
220 HREF="actions-file.html"
223 filter file is organized in sections, which are called <SPAN
230 here. Each filter consists of a heading line, that starts with one of the
243 >CLIENT-HEADER-FILTER:</TT
246 >SERVER-HEADER-FILTER:</TT
248 followed by the filter's <SPAN
254 >, and a short (one line)
261 > of what it does. Below that line
268 >, i.e. lines that define the actual
269 text substitutions. By convention, the name of a filter
270 should describe what the filter <SPAN
277 comment is used in the <A
278 HREF="http://config.privoxy.org/"
284 > Once a filter called <TT
290 in the filter file, it can be invoked by using an action of the form
294 HREF="actions-file.html#FILTER"
304 HREF="actions-file.html"
308 > Filter definitions start with a header line that contains the filter
309 type, the filter name and the filter description.
310 A content filter header line for a filter called <SPAN
323 >FILTER: foo Replace all "foo" with "bar"</PRE
328 > Below that line, and up to the next header line, come the jobs that
329 define what text replacements the filter executes. They are specified
330 in a syntax that imitates <A
331 HREF="http://www.perl.org/"
338 > operator. If you are familiar with Perl, you
339 will find this to be quite intuitive, and may want to look at the
340 PCRS documentation for the subtle differences to Perl behaviour.</P
342 > Most notably, the non-standard option letter <TT
346 which turns the default to ungreedy matching (add <TT
350 quantifiers to turn them greedy again).</P
352 > The non-standard option letter <TT
356 to use the variables $host, $origin (the IP address the request came from),
357 $path, $url and $listen-address (the address on which Privoxy accepted the
358 client request. Example: 127.0.0.1:8118).
359 They will be replaced with the value they refer to before the filter
362 > Note that '$' is a bad choice for a delimiter in a dynamic filter as you
363 might end up with unintended variables if you use a variable name
364 directly after the delimiter. Variables will be resolved without
365 escaping anything, therefore you also have to be careful not to chose
366 delimiters that appear in the replacement text. For example '<' should
367 be save, while '?' will sooner or later cause conflicts with $url.</P
369 > The non-standard option letter <TT
373 parsing for backreferences in the substitute. Use it if you want to include
374 text like '$&' in your substitute without quoting.</P
378 HREF="http://en.wikipedia.org/wiki/Regular_expressions"
385 >, you might want to take a look at
387 HREF="appendix.html#REGEX"
388 >Appendix on regular expressions</A
391 HREF="http://perldoc.perl.org/perlre.html"
397 HREF="http://perldoc.perl.org/perlop.html"
403 > operator's syntax</A
405 HREF="http://perldoc.perl.org/perlre.html"
410 The below examples might also help to get you started.</P
416 NAME="FILTER-FILE-TUT"
417 >9.1. Filter File Tutorial</A
420 > Now, let's complete our <SPAN
423 > content filter. We have already defined
424 the heading, but the jobs are still missing. Since all it does is to replace
431 >, there is only one (trivial) job
446 > But wait! Didn't the comment say that <SPAN
456 > should be replaced? Our current job will only take
457 care of the first <SPAN
460 > on each page. For global substitution,
461 we'll need to add the <TT
478 > Our complete filter now looks like this:</P
487 >FILTER: foo Replace all "foo" with "bar"
493 > Let's look at some real filters for more interesting examples. Here you see
494 a filter that protects against some common annoyances that arise from JavaScript
495 abuse. Let's look at its jobs one after the other:</P
504 >FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
506 # Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
508 s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg</PRE
513 > Following the header line and a comment, you see the job. Note that it uses
517 > as the delimiter instead of <TT
521 the pattern contains a forward slash, which would otherwise have to be escaped
527 > Now, let's examine the pattern: it starts with the text <TT
531 enclosed in parentheses. Since the dot matches any character, and <TT
537 >"Match an arbitrary number of the element left of myself"</SPAN
549 it matches the whole page, from the start of the first <script> tag.</P
551 > That's more than we want, but the pattern continues: <TT
553 >document\.referrer</TT
555 matches only the exact string <SPAN
557 >"document.referrer"</SPAN
565 >, i.e. preceded by a backslash, to take away its
566 special meaning as a joker, and make it just a regular dot. So far, the meaning is:
567 Match from the start of the first <script> tag in a the page, up to, and including,
570 >"document.referrer"</SPAN
578 in the page (and appear in that order).</P
580 > But there's still more pattern to go. The next element, again enclosed in parentheses,
583 >.*</script></TT
584 >. You already know what <TT
588 means, so the whole pattern translates to: Match from the start of the first <script>
589 tag in a page to the end of the last <script> tag, provided that the text
592 >"document.referrer"</SPAN
593 > appears somewhere in between.</P
595 > This is still not the whole story, since we have ignored the options and the parentheses:
596 The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
597 remembered and be available through the variables <TT
601 the substitute. The <TT
604 > option switches to ungreedy matching, which means
608 > in the pattern will only <SPAN
612 text in between <SPAN
624 >"document.referrer"</SPAN
625 >, and that the second <TT
629 only span the text up to the <SPAN
637 >"</script>"</SPAN
639 tag. Furthermore, the <TT
642 > option says that the match may span
643 multiple lines in the page, and the <TT
646 > option again means that the
647 substitution is global.</P
649 > So, to summarize, the pattern means: Match all scripts that contain the text
652 >"document.referrer"</SPAN
653 >. Remember the parts of the script from
654 (and including) the start tag up to (and excluding) the string
657 >"document.referrer"</SPAN
661 >, and the part following
662 that string, up to and including the closing tag, as <TT
667 > Now the pattern is deciphered, but wasn't this about substituting things? So
668 lets look at the substitute: <TT
670 >$1"Not Your Business!"$2</TT
672 easy to read: The text remembered as <TT
678 >"Not Your Business!"</TT
686 the quotation marks!), followed by the text remembered as <TT
690 This produces an exact copy of the original string, with the middle part
693 >"document.referrer"</SPAN
700 > The whole job now reads: Replace <SPAN
702 >"document.referrer"</SPAN
706 >"Not Your Business!"</TT
707 > wherever it appears inside a
708 <script> tag. Note that this job won't break JavaScript syntax,
709 since both the original and the replacement are syntactically valid
710 string objects. The script just won't have access to the referrer
711 information anymore.</P
713 > We'll show you two other jobs from the JavaScript taming department, but
714 this time only point out the constructs of special interest:</P
723 ># The status bar is for displaying link targets, not pointless blahblah
725 s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig</PRE
733 > stands for whitespace characters (space, tab, newline,
734 carriage return, form feed), so that <TT
740 or more whitespace"</SPAN
748 makes this matching of arbitrary text ungreedy. (Note that the <TT
752 option is not set). The <TT
755 > construct means: <SPAN
764 > a double quote"</SPAN
769 a back-reference to the first parenthesis just like <TT
773 with the difference that in the <SPAN
779 >, a backslash indicates
780 a back-reference, whereas in the <SPAN
786 >, it's the dollar.</P
788 > So what does this job do? It replaces assignments of single- or double-quoted
791 >"window.status"</SPAN
792 > object with a dummy assignment
793 (using a variable name that is hopefully odd enough not to conflict with
794 real variables in scripts). Thus, it catches many cases where e.g. pointless
795 descriptions are displayed in the status bar instead of the link target when
796 you move your mouse over links.</P
805 ># Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
807 s/(<body [^>]*)onunload(.*>)/$1never$2/iU</PRE
814 HREF="http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
818 > in the HTML DOM was a <SPAN
825 When I close a browser window, I want it to close and die. Basta.
826 This job replaces the <SPAN
832 >"<body>"</SPAN
833 > tags with the dummy word <TT
840 > option makes the pattern matching
841 case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
842 a minimal match: In the first parenthesis, we had to use <TT
849 > to prevent the match from exceeding the
850 <body> tag if it doesn't contain <SPAN
856 > The last example is from the fun department:</P
865 >FILTER: fun Fun text replacements
867 # Spice the daily news:
869 s/microsoft(?!\.com)/MicroSuck/ig</PRE
877 > part (a so-called negative lookahead)
878 in the job's pattern, which means: Don't match, if the string
882 > appears directly following <SPAN
886 in the page. This prevents links to microsoft.com from being trashed, while
887 still replacing the word everywhere else.</P
896 ># Buzzword Bingo (example for extended regex syntax)
898 s* industry[ -]leading \
900 | customer[ -]focused \
902 | award[ -]winning # Comments are OK, too! \
903 | high[ -]performance \
904 | solutions[ -]based \
908 *<font color="red"><b>BINGO!</b></font> \
917 > option in this job turns on extended syntax, and allows for
918 e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.</P
920 > You get the idea?</P
927 NAME="PREDEFINED-FILTERS"
928 >9.2. The Pre-defined Filters</A
931 >The distribution <TT
934 > file contains a selection of
935 pre-defined filters for your convenience:</P
951 > The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
959 > replaces JavaScript references to the browser's referrer information
960 with the string "Not Your Business!". This compliments the <TT
963 HREF="actions-file.html#HIDE-REFERRER"
966 > action on the content level.
971 > removes the bindings to the DOM's
973 HREF="http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents"
977 > which we feel has no right to exist and is responsible for most <SPAN
979 >"exit consoles"</SPAN
981 nasty windows that pop up when you close another one.
986 > removes code that causes new windows to be opened with undesired properties, such as being
987 full-screen, non-resizeable, without location, status or menu bar etc.
992 > Use with caution. This is an aggressive filter, and can break sites that
993 rely heavily on JavaScript.
1006 > This is a very radical measure. It removes virtually all JavaScript event bindings, which
1007 means that scripts can not react to user actions such as mouse movements or clicks, window
1008 resizing etc, anymore. Use with caution!
1015 >strongly discourage</I
1017 > using this filter as a default since it breaks
1018 many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
1032 > This filter will undo many common instances of HTML based abuse.
1042 are neutralized (yeah baby!), and browser windows will be created as
1043 resizeable (as of course they should be!), and will have location,
1044 scroll and menu bars -- even if specified otherwise.
1057 > Most cookies are set in the HTTP dialog, where they can be intercepted
1062 HREF="actions-file.html#CRUNCH-INCOMING-COOKIES"
1063 >crunch-incoming-cookies</A
1069 HREF="actions-file.html#CRUNCH-OUTGOING-COOKIES"
1070 >crunch-outgoing-cookies</A
1073 actions. But web sites increasingly make use of HTML meta tags and JavaScript
1074 to sneak cookies to the browser on the content level.
1077 > This filter disables most HTML and JavaScript code that reads or sets
1078 cookies. It cannot detect all clever uses of these types of code, so it
1079 should not be relied on as an absolute fix. Use it wherever you would also
1080 use the cookie crunch actions.
1093 > Disable any refresh tags if the interval is greater than nine seconds (so
1094 that redirections done via refresh tags are not destroyed). This is useful
1095 for dial-on-demand setups, or for those who find this HTML feature
1104 >unsolicited-popups</I
1109 > This filter attempts to prevent only <SPAN
1111 >"unsolicited"</SPAN
1113 windows from opening, yet still allow pop-up windows that the user
1114 has explicitly chosen to open. It was added in version 3.0.1,
1115 as an improvement over earlier such filters.
1118 > Technical note: The filter works by redefining the window.open JavaScript
1119 function to a dummy function, <TT
1121 >PrivoxyWindowOpen()</TT
1123 during the loading and rendering phase of each HTML page access, and
1124 restoring the function afterward.
1127 > This is recommended only for browsers that cannot perform this function
1128 reliably themselves. And be aware that some sites require such windows
1129 in order to function normally. Use with caution.
1142 > Attempt to prevent <SPAN
1148 > pop-up windows from opening.
1149 Note this should be used with even more discretion than the above, since
1150 it is more likely to break some sites that require pop-ups for normal
1151 usage. Use with caution.
1164 > This is a helper filter that has no value if used alone. It makes the
1167 >banners-by-size</TT
1170 >banners-by-link</TT
1172 (see below) filters more effective and should be enabled together with them.
1185 > This filter removes image tags purely based on what size they are. Fortunately
1186 for us, many ads and banner images tend to conform to certain standardized
1187 sizes, which makes this filter quite effective for ad stripping purposes.
1190 > Occasionally this filter will cause false positives on images that are not ads,
1191 but just happen to be of one of the standard banner sizes.
1194 > Recommended only for those who require extreme ad blocking. The default
1195 block rules should catch 95+% of all ads <SPAN
1201 > this filter enabled.
1214 > This is an experimental filter that attempts to kill any banners if
1215 their URLs seem to point to known or suspected click trackers. It is currently
1216 not of much value and is not recommended for use by default.
1229 > Webbugs are small, invisible images (technically 1X1 GIF images), that
1230 are used to track users across websites, and collect information on them.
1231 As an HTML page is loaded by the browser, an embedded image tag causes the
1232 browser to contact a third-party site, disclosing the tracking information
1233 through the requested URL and/or cookies for that third-party domain, without
1234 the user ever becoming aware of the interaction with the third-party site.
1235 HTML-ized spam also uses a similar technique to verify email addresses.
1238 > This filter removes the HTML code that loads such <SPAN
1254 > A rather special-purpose filter that can be used to enlarge textareas (those
1255 multi-line text boxes in web forms) and turn off hard word wrap in them.
1256 It was written for the sourceforge.net tracker system where such boxes are
1257 a nuisance, but it can be handy on other sites, too.
1260 > It is not recommended to use this filter as a default.
1273 > Many consider windows that move, or resize themselves to be abusive. This filter
1274 neutralizes the related JavaScript code. Note that some sites might not display
1275 or behave as intended when using this filter. Use with caution.
1283 >frameset-borders</I
1288 > Some web designers seem to assume that everyone in the world will view their
1289 web sites using the same browser brand and version, screen resolution etc,
1290 because only that assumption could explain why they'd use static frame sizes,
1291 yet prevent their frames from being resized by the user, should they be too
1292 small to show their whole content.
1295 > This filter removes the related HTML code. It should only be applied to sites
1309 > Many Microsoft products that generate HTML use non-standard extensions (read:
1310 violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
1311 HTML documents to display with errors on standard-compliant platforms.
1314 > This filter translates the MS-only characters into Latin-1 equivalents.
1315 It is not necessary when using MS products, and will cause corruption of
1316 all documents that use 8-bit character sets other than Latin-1. It's mostly
1317 worthwhile for Europeans on non-MS platforms, if weird garbage characters
1318 sometimes appear on some pages, or user agents that don't correct for this on
1332 > A filter for shockwave haters. As the name suggests, this filter strips code
1333 out of web pages that is used to embed shockwave flash objects.
1343 >quicktime-kioskmode</I
1348 > Change HTML code that embeds Quicktime objects so that kioskmode, which
1349 prevents saving, is disabled.
1362 > Text replacements for subversive browsing fun. Make fun of your favorite
1363 Monopolist or play buzzword bingo.
1376 > A demonstration-only filter that shows how <SPAN
1380 can be used to delete web content on a keyword basis.
1393 > An experimental collection of text replacements to disable malicious HTML and JavaScript
1394 code that exploits known security holes in Internet Explorer.
1397 > Presently, it only protects against Nimda and a cross-site scripting bug, and
1398 would need active maintenance to provide more substantial protection.
1411 > Some web sites have very specific problems, the cure for which doesn't apply
1412 anywhere else, or could even cause damage on other sites.
1415 > This is a collection of such site-specific cures which should only be applied
1416 to the sites they were intended for, which is what the supplied
1420 > file does. Users shouldn't need to change
1421 anything regarding this filter.
1434 > A CSS based block for Google text ads. Also removes a width limitation
1435 and the toolbar advertisement.
1448 > Another CSS based block, this time for Yahoo text ads. And removes
1449 a width limitation as well.
1462 > Another CSS based block, this time for MSN text ads. And removes
1463 tracking URLs, as well as a width limitation.
1476 > Cleans up some Blogspot blogs. Read the fine print before using this one!
1479 > This filter also intentionally removes some navigation stuff and sets the
1480 page width to 100%. As a result, some rounded <SPAN
1484 appear to early or not at all and as fixing this would require a browser
1485 that understands background-size (CSS3), they are removed instead.
1498 > Server-header filter to change the Content-Type from xml to html.
1511 > Server-header filter to change the Content-Type from html to xml.
1524 > Removes the non-standard <TT
1528 anchor and area HTML tags.
1536 >hide-tor-exit-notation</I
1541 > Client-header filter to remove the <B
1544 > exit node notation
1545 found in Host and Referer headers.
1554 > are chained and <SPAN
1558 is configured to use socks4a, one can use <SPAN
1560 >"http://www.example.org.foobar.exit/"</SPAN
1562 to access the host <SPAN
1564 >"www.example.org"</SPAN
1575 > As the HTTP client isn't aware of this notation, it treats the
1578 >"www.example.org.foobar.exit"</SPAN
1579 > as host and uses it
1587 server's point of view the resulting headers are invalid and can cause problems.
1593 > header can trigger <SPAN
1595 >"hot-linking"</SPAN
1597 protections, an invalid <SPAN
1600 > header will make it impossible for
1601 the server to find the right vhost (several domains hosted on the same IP address).
1604 > This client-header filter removes the <SPAN
1607 > part in those headers
1608 to prevent the mentioned problems. Note that it only modifies
1609 the HTTP headers, it doesn't make it impossible for the server
1613 > exit node based on the IP address
1614 the request is coming from.
1625 NAME="EXTERNAL-FILTER-SYNTAX"
1626 >9.3. External filter syntax</A
1629 > External filters are scripts or programs that can modify the content in
1633 HREF="actions-file.html#FILTER"
1637 aren't powerful enough.</P
1639 > External filters can be written in any language the platform <SPAN
1645 > They are controlled with the
1649 HREF="actions-file.html#EXTERNAL-FILTER"
1653 and have to be defined in the <TT
1656 HREF="config.html#FILTERFILE"
1662 > The header looks like any other filter, but instead of pcrs jobs, external
1663 filters contain a single job which can be a program or a shell script (which
1664 may call other scripts or programs).</P
1666 > External filters read the content from STDIN and write the rewritten
1668 The environment variables PRIVOXY_URL, PRIVOXY_PATH, PRIVOXY_HOST,
1669 PRIVOXY_ORIGIN, PRIVOXY_LISTEN_ADDRESS can be used to get some details
1670 about the client request.</P
1675 > will temporary store the content to filter in the
1679 HREF="config.html#TEMPORARY-DIRECTORY"
1680 >temporary-directory</A
1691 >EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content
1694 # Incorrect reimplementation of the filter above in POSIX shell.
1696 # Note that it's a single job that spans multiple lines, the line
1697 # breaks are not passed to the shell, thus the semicolons are required.
1699 # If the script isn't trivial, it is recommended to put it into an external file.
1701 # In general, writing external filters entirely in POSIX shell is not
1702 # considered a good idea.
1703 EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content
1709 EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value.
1710 /usr/local/bin/convert - -rotate 180 -
1712 EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment.
1713 /usr/local/bin/convert - -pointsize 16 -fill white -annotate +17+418 "[citation needed]" -</PRE
1736 > Currently external filters are executed with <SPAN
1740 Only use external filters you understand and trust.
1747 > External filters are experimental and the syntax may change in the future.</P
1755 SUMMARY="Footer navigation table"
1766 HREF="actions-file.html"
1784 HREF="templates.html"
1804 >Privoxy's Template Files</TD