1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
2 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
8 <meta name="GENERATOR" content=
9 "Modular DocBook HTML Stylesheet Version 1.79">
10 <link rel="HOME" title="Privoxy 3.0.25 User Manual" href="index.html">
11 <link rel="PREVIOUS" title="See Also" href="seealso.html">
12 <link rel="STYLESHEET" type="text/css" href="../p_doc.css">
13 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
14 <link rel="STYLESHEET" type="text/css" href="p_doc.css">
16 <body class="SECT1" bgcolor="#EEEEEE" text="#000000" link="#0000FF" vlink=
17 "#840084" alink="#0000FF">
18 <div class="NAVHEADER">
19 <table summary="Header navigation table" width="100%" border="0"
20 cellpadding="0" cellspacing="0">
22 <th colspan="3" align="center">
23 Privoxy 3.0.25 User Manual
27 <td width="10%" align="left" valign="bottom">
28 <a href="seealso.html" accesskey="P">Prev</a>
30 <td width="80%" align="center" valign="bottom">
32 <td width="10%" align="right" valign="bottom">
37 <hr align="LEFT" width="100%">
41 <a name="APPENDIX">14. Appendix</a>
45 <a name="REGEX">14.1. Regular Expressions</a>
48 <span class="APPLICATION">Privoxy</span> uses Perl-style <span
49 class="QUOTE">"regular expressions"</span> in its <a href=
50 "actions-file.html">actions files</a> and <a href=
51 "filter-file.html">filter file</a>, through the <a href=
52 "http://www.pcre.org/" target="_top">PCRE</a> and <span class=
53 "APPLICATION">PCRS</span> libraries.
56 If you are reading this, you probably don't understand what <span
57 class="QUOTE">"regular expressions"</span> are, or what they can
58 do. So this will be a very brief introduction only. A full
59 explanation would require a <a href=
60 "http://www.oreilly.com/catalog/regex/" target="_top">book</a> ;-)
63 Regular expressions provide a language to describe patterns that
64 can be run against strings of characters (letter, numbers, etc), to
65 see if they match the string or not. The patterns are themselves
66 (sometimes complex) strings of literal characters, combined with
67 wild-cards, and other special characters, called meta-characters.
68 The <span class="QUOTE">"meta-characters"</span> have special
69 meanings and are used to build complex patterns to be matched
70 against. Perl Compatible Regular Expressions are an especially
71 convenient <span class="QUOTE">"dialect"</span> of the regular
75 To make a simple analogy, we do something similar when we use
76 wild-card characters when listing files with the <b class=
77 "COMMAND">dir</b> command in DOS. <tt class="LITERAL">*.*</tt>
78 matches all filenames. The <span class="QUOTE">"special"</span>
79 character here is the asterisk which matches any and all
80 characters. We can be more specific and use <tt class=
81 "LITERAL">?</tt> to match just individual characters. So <span
82 class="QUOTE">"dir file?.text"</span> would match <span class=
83 "QUOTE">"file1.txt"</span>, <span class="QUOTE">"file2.txt"</span>,
84 etc. We are pattern matching, using a similar technique to <span
85 class="QUOTE">"regular expressions"</span>!
88 Regular expressions do essentially the same thing, but are much,
89 much more powerful. There are many more <span class=
90 "QUOTE">"special characters"</span> and ways of building complex
91 patterns however. Let's look at a few of the common ones, and then
98 <span class="emphasis"><i class="EMPHASIS">.</i></span> -
99 Matches any single character, e.g. <span class=
100 "QUOTE">"a"</span>, <span class="QUOTE">"A"</span>, <span
101 class="QUOTE">"4"</span>, <span class="QUOTE">":"</span>, or
102 <span class="QUOTE">"@"</span>.
112 <span class="emphasis"><i class="EMPHASIS">?</i></span> - The
113 preceding character or expression is matched ZERO or ONE
124 <span class="emphasis"><i class="EMPHASIS">+</i></span> - The
125 preceding character or expression is matched ONE or MORE
136 <span class="emphasis"><i class="EMPHASIS">*</i></span> - The
137 preceding character or expression is matched ZERO or MORE
148 <span class="emphasis"><i class="EMPHASIS">\</i></span> - The
149 <span class="QUOTE">"escape"</span> character denotes that
150 the following character should be taken literally. This is
151 used where one of the special characters (e.g. <span class=
152 "QUOTE">"."</span>) needs to be taken literally and not as a
153 special meta-character. Example: <span class=
154 "QUOTE">"example\.com"</span>, makes sure the period is
155 recognized only as a period (and not expanded to its
156 meta-character meaning of any single character).
166 <span class="emphasis"><i class="EMPHASIS">[ ]</i></span> -
167 Characters enclosed in brackets will be matched if any of the
168 enclosed characters are encountered. For instance, <span
169 class="QUOTE">"[0-9]"</span> matches any numeric digit (zero
170 through nine). As an example, we can combine this with <span
171 class="QUOTE">"+"</span> to match any digit one of more
172 times: <span class="QUOTE">"[0-9]+"</span>.
182 <span class="emphasis"><i class="EMPHASIS">( )</i></span> -
183 parentheses are used to group a sub-expression, or multiple
194 <span class="emphasis"><i class="EMPHASIS">|</i></span> - The
195 <span class="QUOTE">"bar"</span> character works like an
196 <span class="QUOTE">"or"</span> conditional statement. A
197 match is successful if the sub-expression on either side of
198 <span class="QUOTE">"|"</span> matches. As an example: <span
199 class="QUOTE">"/(this|that) example/"</span> uses grouping
200 and the bar character and would match either <span class=
201 "QUOTE">"this example"</span> or <span class="QUOTE">"that
202 example"</span>, and nothing else.
209 These are just some of the ones you are likely to use when matching
210 URLs with <span class="APPLICATION">Privoxy</span>, and is a long
211 way from a definitive list. This is enough to get us started with a
212 few simple examples which may be more illuminating:
215 <span class="emphasis"><i class="EMPHASIS"><tt class=
216 "LITERAL">/.*/banners/.*</tt></i></span> - A simple example that
217 uses the common combination of <span class="QUOTE">"."</span> and
218 <span class="QUOTE">"*"</span> to denote any character, zero or
219 more times. In other words, any string at all. So we start with a
220 literal forward slash, then our regular expression pattern (<span
221 class="QUOTE">".*"</span>) another literal forward slash, the
222 string <span class="QUOTE">"banners"</span>, another forward slash,
223 and lastly another <span class="QUOTE">".*"</span>. We are building
224 a directory path here. This will match any file with the path that
225 has a directory named <span class="QUOTE">"banners"</span> in it.
226 The <span class="QUOTE">".*"</span> matches any characters, and
227 this could conceivably be more forward slashes, so it might expand
228 into a much longer looking path. For example, this could match:
230 "QUOTE">"/eye/hate/spammers/banners/annoy_me_please.gif"</span>, or
231 just <span class="QUOTE">"/banners/annoying.html"</span>, or almost
232 an infinite number of other possible combinations, just so it has
233 <span class="QUOTE">"banners"</span> in the path somewhere.
236 And now something a little more complex:
239 <span class="emphasis"><i class="EMPHASIS"><tt class=
240 "LITERAL">/.*/adv((er)?ts?|ertis(ing|ements?))?/</tt></i></span> -
241 We have several literal forward slashes again (<span class=
242 "QUOTE">"/"</span>), so we are building another expression that is
243 a file path statement. We have another <span class=
244 "QUOTE">".*"</span>, so we are matching against any conceivable
245 sub-path, just so it matches our expression. The only true literal
246 that <span class="emphasis"><i class="EMPHASIS">must
247 match</i></span> our pattern is <span class=
248 "APPLICATION">adv</span>, together with the forward slashes. What
249 comes after the <span class="QUOTE">"adv"</span> string is the
253 Remember the <span class="QUOTE">"?"</span> means the preceding
254 expression (either a literal character or anything grouped with
255 <span class="QUOTE">"(...)"</span> in this case) can exist or not,
256 since this means either zero or one match. So <span class=
257 "QUOTE">"((er)?ts?|ertis(ing|ements?))"</span> is optional, as are
258 the individual sub-expressions: <span class="QUOTE">"(er)"</span>,
259 <span class="QUOTE">"(ing|ements?)"</span>, and the <span class=
260 "QUOTE">"s"</span>. The <span class="QUOTE">"|"</span> means <span
261 class="QUOTE">"or"</span>. We have two of those. For instance,
262 <span class="QUOTE">"(ing|ements?)"</span>, can expand to match
263 either <span class="QUOTE">"ing"</span> <span class="emphasis"><i
264 class="EMPHASIS">OR</i></span> <span class=
265 "QUOTE">"ements?"</span>. What is being done here, is an attempt at
266 matching as many variations of <span class=
267 "QUOTE">"advertisement"</span>, and similar, as possible. So this
268 would expand to match just <span class="QUOTE">"adv"</span>, or
269 <span class="QUOTE">"advert"</span>, or <span class=
270 "QUOTE">"adverts"</span>, or <span class=
271 "QUOTE">"advertising"</span>, or <span class=
272 "QUOTE">"advertisement"</span>, or <span class=
273 "QUOTE">"advertisements"</span>. You get the idea. But it would not
274 match <span class="QUOTE">"advertizements"</span> (with a <span
275 class="QUOTE">"z"</span>). We could fix that by changing our
276 regular expression to: <span class=
277 "QUOTE">"/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/"</span>, which
278 would then match either spelling.
281 <span class="emphasis"><i class="EMPHASIS"><tt class=
282 "LITERAL">/.*/advert[0-9]+\.(gif|jpe?g)</tt></i></span> - Again
283 another path statement with forward slashes. Anything in the square
284 brackets <span class="QUOTE">"[ ]"</span> can be matched. This is
285 using <span class="QUOTE">"0-9"</span> as a shorthand expression to
286 mean any digit one through nine. It is the same as saying <span
287 class="QUOTE">"0123456789"</span>. So any digit matches. The <span
288 class="QUOTE">"+"</span> means one or more of the preceding
289 expression must be included. The preceding expression here is what
290 is in the square brackets -- in this case, any digit one through
291 nine. Then, at the end, we have a grouping: <span class=
292 "QUOTE">"(gif|jpe?g)"</span>. This includes a <span class=
293 "QUOTE">"|"</span>, so this needs to match the expression on either
294 side of that bar character also. A simple <span class=
295 "QUOTE">"gif"</span> on one side, and the other side will in turn
296 match either <span class="QUOTE">"jpeg"</span> or <span class=
297 "QUOTE">"jpg"</span>, since the <span class="QUOTE">"?"</span>
298 means the letter <span class="QUOTE">"e"</span> is optional and can
299 be matched once or not at all. So we are building an expression
300 here to match image GIF or JPEG type image file. It must include
301 the literal string <span class="QUOTE">"advert"</span>, then one or
302 more digits, and a <span class="QUOTE">"."</span> (which is now a
303 literal, and not a special character, since it is escaped with
304 <span class="QUOTE">"\"</span>), and lastly either <span class=
305 "QUOTE">"gif"</span>, or <span class="QUOTE">"jpeg"</span>, or
306 <span class="QUOTE">"jpg"</span>. Some possible matches would
307 include: <span class="QUOTE">"//advert1.jpg"</span>, <span class=
308 "QUOTE">"/nasty/ads/advert1234.gif"</span>, <span class=
309 "QUOTE">"/banners/from/hell/advert99.jpg"</span>. It would not
310 match <span class="QUOTE">"advert1.gif"</span> (no leading slash),
311 or <span class="QUOTE">"/adverts232.jpg"</span> (the expression
312 does not include an <span class="QUOTE">"s"</span>), or <span
313 class="QUOTE">"/advert1.jsp"</span> (<span class=
314 "QUOTE">"jsp"</span> is not in the expression anywhere).
317 We are barely scratching the surface of regular expressions here so
318 that you can understand the default <span class=
319 "APPLICATION">Privoxy</span> configuration files, and maybe use
320 this knowledge to customize your own installation. There is much,
321 much more that can be done with regular expressions. Now that you
322 know enough to get started, you can learn more on your own :/
325 More reading on Perl Compatible Regular expressions: <a href=
326 "http://perldoc.perl.org/perlre.html" target=
327 "_top">http://perldoc.perl.org/perlre.html</a>
330 For information on regular expression based substitutions and their
331 applications in filters, please see the <a href=
332 "filter-file.html">filter file tutorial</a> in this manual.
337 <a name="INTERNAL-PAGES">14.2. Privoxy's Internal Pages</a>
340 Since <span class="APPLICATION">Privoxy</span> proxies each
341 requested web page, it is easy for <span class=
342 "APPLICATION">Privoxy</span> to trap certain special URLs. In this
343 way, we can talk directly to <span class=
344 "APPLICATION">Privoxy</span>, and see how it is configured, see how
345 our rules are being applied, change these rules and other
346 configuration options, and even turn <span class=
347 "APPLICATION">Privoxy's</span> filtering off, all with a web
351 The URLs listed below are the special ones that allow direct access
352 to <span class="APPLICATION">Privoxy</span>. Of course, <span
353 class="APPLICATION">Privoxy</span> must be running to access these.
354 If not, you will get a friendly error message. Internet access is
355 not necessary either.
364 <a name="AEN5923"></a>
365 <blockquote class="BLOCKQUOTE">
367 <a href="http://config.privoxy.org/" target=
368 "_top">http://config.privoxy.org/</a>
372 There is a shortcut: <a href="http://p.p/" target=
373 "_top">http://p.p/</a> (But it doesn't provide a fall-back to a
374 real page, in case the request is not sent through <span class=
375 "APPLICATION">Privoxy</span>)
380 Show information about the current configuration, including
381 viewing and editing of actions files:
383 <a name="AEN5931"></a>
384 <blockquote class="BLOCKQUOTE">
386 <a href="http://config.privoxy.org/show-status" target=
387 "_top">http://config.privoxy.org/show-status</a>
393 Show the source code version numbers:
395 <a name="AEN5936"></a>
396 <blockquote class="BLOCKQUOTE">
398 <a href="http://config.privoxy.org/show-version" target=
399 "_top">http://config.privoxy.org/show-version</a>
405 Show the browser's request headers:
407 <a name="AEN5941"></a>
408 <blockquote class="BLOCKQUOTE">
410 <a href="http://config.privoxy.org/show-request" target=
411 "_top">http://config.privoxy.org/show-request</a>
417 Show which actions apply to a URL and why:
419 <a name="AEN5946"></a>
420 <blockquote class="BLOCKQUOTE">
422 <a href="http://config.privoxy.org/show-url-info" target=
423 "_top">http://config.privoxy.org/show-url-info</a>
429 Toggle Privoxy on or off. This feature can be turned off/on in
430 the main <tt class="FILENAME">config</tt> file. When toggled
431 <span class="QUOTE">"off"</span>, <span class=
432 "QUOTE">"Privoxy"</span> continues to run, but only as a
433 pass-through proxy, with no actions taking place:
435 <a name="AEN5954"></a>
436 <blockquote class="BLOCKQUOTE">
438 <a href="http://config.privoxy.org/toggle" target=
439 "_top">http://config.privoxy.org/toggle</a>
443 Short cuts. Turn off, then on:
445 <a name="AEN5958"></a>
446 <blockquote class="BLOCKQUOTE">
448 <a href="http://config.privoxy.org/toggle?set=disable"
450 "_top">http://config.privoxy.org/toggle?set=disable</a>
453 <a name="AEN5961"></a>
454 <blockquote class="BLOCKQUOTE">
456 <a href="http://config.privoxy.org/toggle?set=enable" target=
457 "_top">http://config.privoxy.org/toggle?set=enable</a>
465 <a name="CHAIN">14.3. Chain of Events</a>
468 Let's take a quick look at how some of <span class=
469 "APPLICATION">Privoxy's</span> core features are triggered, and the
470 ensuing sequence of events when a web page is requested by your
478 First, your web browser requests a web page. The browser knows
479 to send the request to <span class=
480 "APPLICATION">Privoxy</span>, which will in turn, relay the
481 request to the remote web server after passing the following
487 <span class="APPLICATION">Privoxy</span> traps any request for
488 its own internal CGI pages (e.g <a href="http://p.p/" target=
489 "_top">http://p.p/</a>) and sends the CGI page back to the
495 Next, <span class="APPLICATION">Privoxy</span> checks to see if
496 the URL matches any <a href="actions-file.html#BLOCK"><span
497 class="QUOTE">"+block"</span></a> patterns. If so, the URL is
498 then blocked, and the remote web server will not be contacted.
499 <a href="actions-file.html#HANDLE-AS-IMAGE"><span class=
500 "QUOTE">"+handle-as-image"</span></a> and <a href=
501 "actions-file.html#HANDLE-AS-EMPTY-DOCUMENT"><span class=
502 "QUOTE">"+handle-as-empty-document"</span></a> are then
503 checked, and if there is no match, an HTML <span class=
504 "QUOTE">"BLOCKED"</span> page is sent back to the browser.
505 Otherwise, if it does match, an image is returned for the
506 former, and an empty text document for the latter. The type of
507 image would depend on the setting of <a href=
508 "actions-file.html#SET-IMAGE-BLOCKER"><span class=
509 "QUOTE">"+set-image-blocker"</span></a> (blank, checkerboard
510 pattern, or an HTTP redirect to an image elsewhere).
515 Untrusted URLs are blocked. If URLs are being added to the <tt
516 class="FILENAME">trust</tt> file, then that is done.
521 If the URL pattern matches the <a href=
522 "actions-file.html#FAST-REDIRECTS"><span class=
523 "QUOTE">"+fast-redirects"</span></a> action, it is then
524 processed. Unwanted parts of the requested URL are stripped.
529 Now the rest of the client browser's request headers are
530 processed. If any of these match any of the relevant actions
531 (e.g. <a href="actions-file.html#HIDE-USER-AGENT"><span class=
532 "QUOTE">"+hide-user-agent"</span></a>, etc.), headers are
533 suppressed or forged as determined by these actions and their
539 Now the web server starts sending its response back (i.e.
540 typically a web page).
545 First, the server headers are read and processed to determine,
546 among other things, the MIME type (document type) and encoding.
547 The headers are then filtered as determined by the <a href=
548 "actions-file.html#CRUNCH-INCOMING-COOKIES"><span class=
549 "QUOTE">"+crunch-incoming-cookies"</span></a>, <a href=
550 "actions-file.html#SESSION-COOKIES-ONLY"><span class=
551 "QUOTE">"+session-cookies-only"</span></a>, and <a href=
552 "actions-file.html#DOWNGRADE-HTTP-VERSION"><span class=
553 "QUOTE">"+downgrade-http-version"</span></a> actions.
558 If any <a href="actions-file.html#FILTER"><span class=
559 "QUOTE">"+filter"</span></a> action or <a href=
560 "actions-file.html#DEANIMATE-GIFS"><span class=
561 "QUOTE">"+deanimate-gifs"</span></a> action applies (and the
562 document type fits the action), the rest of the page is read
563 into memory (up to a configurable limit). Then the filter rules
564 (from <tt class="FILENAME">default.filter</tt> and any other
565 filter files) are processed against the buffered content.
566 Filters are applied in the order they are specified in one of
567 the filter files. Animated GIFs, if present, are reduced to
568 either the first or last frame, depending on the action
569 setting.The entire page, which is now filtered, is then sent by
570 <span class="APPLICATION">Privoxy</span> back to your browser.
573 If neither a <a href="actions-file.html#FILTER"><span class=
574 "QUOTE">"+filter"</span></a> action or <a href=
575 "actions-file.html#DEANIMATE-GIFS"><span class=
576 "QUOTE">"+deanimate-gifs"</span></a> matches, then <span class=
577 "APPLICATION">Privoxy</span> passes the raw data through to the
578 client browser as it becomes available.
583 As the browser receives the now (possibly filtered) page
584 content, it reads and then requests any URLs that may be
585 embedded within the page source, e.g. ad images, stylesheets,
586 JavaScript, other HTML documents (e.g. frames), sounds, etc.
587 For each of these objects, the browser issues a separate
588 request (this is easily viewable in <span class=
589 "APPLICATION">Privoxy's</span> logs). And each such request is
590 in turn processed just as above. Note that a complex web page
591 will have many, many such embedded URLs. If these secondary
592 requests are to a different server, then quite possibly a very
593 differing set of actions is triggered.
599 NOTE: This is somewhat of a simplistic overview of what happens
600 with each URL request. For the sake of brevity and simplicity, we
601 have focused on <span class="APPLICATION">Privoxy's</span> core
607 <a name="ACTIONSANAT">14.4. Troubleshooting: Anatomy of an
611 The way <span class="APPLICATION">Privoxy</span> applies <a href=
612 "actions-file.html#ACTIONS">actions</a> and <a href=
613 "actions-file.html#FILTER">filters</a> to any given URL can be
614 complex, and not always so easy to understand what is happening.
615 And sometimes we need to be able to <span class="emphasis"><i
616 class="EMPHASIS">see</i></span> just what <span class=
617 "APPLICATION">Privoxy</span> is doing. Especially, if something
618 <span class="APPLICATION">Privoxy</span> is doing is causing us a
619 problem inadvertently. It can be a little daunting to look at the
620 actions and filters files themselves, since they tend to be filled
621 with <a href="appendix.html#REGEX">regular expressions</a> whose
622 consequences are not always so obvious.
625 One quick test to see if <span class="APPLICATION">Privoxy</span>
626 is causing a problem or not, is to disable it temporarily. This
627 should be the first troubleshooting step (be sure to flush caches
628 afterward!). Looking at the logs is a good idea too. (Note that
629 both the toggle feature and logging are enabled via <tt class=
630 "FILENAME">config</tt> file settings, and may need to be turned
631 <span class="QUOTE">"on"</span>.)
634 Another easy troubleshooting step to try is if you have done any
635 customization of your installation, revert back to the installed
636 defaults and see if that helps. There are times the developers get
637 complaints about one thing or another, and the problem is more
638 related to a customized configuration issue.
641 <span class="APPLICATION">Privoxy</span> also provides the <a href=
642 "http://config.privoxy.org/show-url-info" target=
643 "_top">http://config.privoxy.org/show-url-info</a> page that can
644 show us very specifically how <span class=
645 "APPLICATION">actions</span> are being applied to any given URL.
646 This is a big help for troubleshooting.
649 First, enter one URL (or partial URL) at the prompt, and then <span
650 class="APPLICATION">Privoxy</span> will tell us how the current
651 configuration will handle it. This will not help with filtering
652 effects (i.e. the <a href="actions-file.html#FILTER"><span class=
653 "QUOTE">"+filter"</span></a> action) from one of the filter files
654 since this is handled very differently and not so easy to trap! It
655 also will not tell you about any other URLs that may be embedded
656 within the URL you are testing. For instance, images such as ads
657 are expressed as URLs within the raw page source of HTML pages. So
658 you will only get info for the actual URL that is pasted into the
659 prompt area -- not any sub-URLs. If you want to know about embedded
660 URLs like ads, you will have to dig those out of the HTML source.
661 Use your browser's <span class="QUOTE">"View Page Source"</span>
662 option for this. Or right click on the ad, and grab the URL.
665 Let's try an example, <a href="http://google.com" target=
666 "_top">google.com</a>, and look at it one section at a time in a
667 sample configuration (your real configuration may vary):
671 <table border="0" bgcolor="#E0E0E0" width="100%">
675 Matches for http://www.google.com:
677 In file: default.action <span class="GUIBUTTON">[ View ]</span> <span class=
678 "GUIBUTTON">[ Edit ]</span>
680 {+change-x-forwarded-for{block}
681 +deanimate-gifs {last}
682 +fast-redirects {check-decoded-url}
683 +filter {refresh-tags}
684 +filter {img-reorder}
685 +filter {banners-by-size}
687 +filter {jumping-windows}
688 +filter {ie-exploits}
689 +hide-from-header {block}
690 +hide-referrer {forge}
691 +session-cookies-only
692 +set-image-blocker {pattern}
695 { -session-cookies-only }
701 In file: user.action <span class="GUIBUTTON">[ View ]</span> <span class=
702 "GUIBUTTON">[ Edit ]</span>
703 (no matches in this file)
710 This is telling us how we have defined our <a href=
711 "actions-file.html#ACTIONS"><span class=
712 "QUOTE">"actions"</span></a>, and which ones match for our test
713 case, <span class="QUOTE">"google.com"</span>. Displayed is all the
714 actions that are available to us. Remember, the <tt class=
715 "LITERAL">+</tt> sign denotes <span class="QUOTE">"on"</span>. <tt
716 class="LITERAL">-</tt> denotes <span class="QUOTE">"off"</span>. So
717 some are <span class="QUOTE">"on"</span> here, but many are <span
718 class="QUOTE">"off"</span>. Each example we try may provide a
719 slightly different end result, depending on our configuration
723 The first listing is for our <tt class=
724 "FILENAME">default.action</tt> file. The large, multi-line listing,
725 is how the actions are set to match for all URLs, i.e. our default
726 settings. If you look at your <span class="QUOTE">"actions"</span>
727 file, this would be the section just below the <span class=
728 "QUOTE">"aliases"</span> section near the top. This will apply to
729 all URLs as signified by the single forward slash at the end of the
730 listing -- <span class="QUOTE">" / "</span>.
733 But we have defined additional actions that would be exceptions to
734 these general rules, and then we list specific URLs (or patterns)
735 that these exceptions would apply to. Last match wins. Just below
736 this then are two explicit matches for <span class=
737 "QUOTE">".google.com"</span>. The first is negating our previous
738 cookie setting, which was for <a href=
739 "actions-file.html#SESSION-COOKIES-ONLY"><span class=
740 "QUOTE">"+session-cookies-only"</span></a> (i.e. not persistent).
741 So we will allow persistent cookies for google, at least that is
742 how it is in this example. The second turns <span class=
743 "emphasis"><i class="EMPHASIS">off</i></span> any <a href=
744 "actions-file.html#FAST-REDIRECTS"><span class=
745 "QUOTE">"+fast-redirects"</span></a> action, allowing this to take
746 place unmolested. Note that there is a leading dot here -- <span
747 class="QUOTE">".google.com"</span>. This will match any hosts and
748 sub-domains, in the google.com domain also, such as <span class=
749 "QUOTE">"www.google.com"</span> or <span class=
750 "QUOTE">"mail.google.com"</span>. But it would not match <span
751 class="QUOTE">"www.google.de"</span>! So, apparently, we have these
752 two actions defined as exceptions to the general rules at the top
753 somewhere in the lower part of our <tt class=
754 "FILENAME">default.action</tt> file, and <span class=
755 "QUOTE">"google.com"</span> is referenced somewhere in these latter
759 Then, for our <tt class="FILENAME">user.action</tt> file, we again
760 have no hits. So there is nothing google-specific that we might
761 have added to our own, local configuration. If there was, those
762 actions would over-rule any actions from previously processed
763 files, such as <tt class="FILENAME">default.action</tt>. <tt class=
764 "FILENAME">user.action</tt> typically has the last word. This is
765 the best place to put hard and fast exceptions,
768 And finally we pull it all together in the bottom section and
769 summarize how <span class="APPLICATION">Privoxy</span> is applying
770 all its <span class="QUOTE">"actions"</span> to <span class=
771 "QUOTE">"google.com"</span>:
775 <table border="0" bgcolor="#E0E0E0" width="100%">
783 +change-x-forwarded-for{block}
784 -client-header-filter{hide-tor-exit-notation}
785 -content-type-overwrite
786 -crunch-client-header
787 -crunch-if-none-match
788 -crunch-incoming-cookies
789 -crunch-outgoing-cookies
790 -crunch-server-header
791 +deanimate-gifs {last}
792 -downgrade-http-version
795 -filter {content-cookies}
797 -filter {banners-by-link}
798 -filter {tiny-textforms}
799 -filter {frameset-borders}
800 -filter {demoronizer}
801 -filter {shockwave-flash}
802 -filter {quicktime-kioskmode}
804 -filter {crude-parental}
805 -filter {site-specifics}
806 -filter {js-annoyances}
807 -filter {html-annoyances}
808 +filter {refresh-tags}
809 -filter {unsolicited-popups}
810 +filter {img-reorder}
811 +filter {banners-by-size}
813 +filter {jumping-windows}
814 +filter {ie-exploits}
821 -handle-as-empty-document
823 -hide-accept-language
824 -hide-content-disposition
825 +hide-from-header {block}
826 -hide-if-modified-since
827 +hide-referrer {forge}
830 -overwrite-last-modified
833 -server-header-filter{xml-to-html}
834 -server-header-filter{html-to-xml}
835 -session-cookies-only
836 +set-image-blocker {pattern}
843 Notice the only difference here to the previous listing, is to
844 <span class="QUOTE">"fast-redirects"</span> and <span class=
845 "QUOTE">"session-cookies-only"</span>, which are activated
846 specifically for this site in our configuration, and thus show in
847 the <span class="QUOTE">"Final Results"</span>.
850 Now another example, <span class=
851 "QUOTE">"ad.doubleclick.net"</span>:
855 <table border="0" bgcolor="#E0E0E0" width="100%">
859 { +block{Domains starts with "ad"} }
862 { +block{Domain contains "ad"} }
865 { +block{Doubleclick banner server} +handle-as-image }
866 .[a-vx-z]*.doubleclick.net
873 We'll just show the interesting part here - the explicit matches.
874 It is matched three different times. Two <span class=
875 "QUOTE">"+block{}"</span> sections, and a <span class=
876 "QUOTE">"+block{} +handle-as-image"</span>, which is the expanded
877 form of one of our aliases that had been defined as: <span class=
878 "QUOTE">"+block-as-image"</span>. (<a href=
879 "actions-file.html#ALIASES"><span class=
880 "QUOTE">"Aliases"</span></a> are defined in the first section of
881 the actions file and typically used to combine more than one
885 Any one of these would have done the trick and blocked this as an
886 unwanted image. This is unnecessarily redundant since the last case
887 effectively would also cover the first. No point in taking chances
888 with these guys though ;-) Note that if you want an ad or obnoxious
889 URL to be invisible, it should be defined as <span class=
890 "QUOTE">"ad.doubleclick.net"</span> is done here -- as both a <a
891 href="actions-file.html#BLOCK"><span class=
892 "QUOTE">"+block{}"</span></a> <span class="emphasis"><i class=
893 "EMPHASIS">and</i></span> an <a href=
894 "actions-file.html#HANDLE-AS-IMAGE"><span class=
895 "QUOTE">"+handle-as-image"</span></a>. The custom alias <span
896 class="QUOTE">"<tt class="LITERAL">+block-as-image</tt>"</span>
897 just simplifies the process and make it more readable.
900 One last example. Let's try <span class=
901 "QUOTE">"http://www.example.net/adsl/HOWTO/"</span>. This one is
902 giving us problems. We are getting a blank page. Hmmm ...
906 <table border="0" bgcolor="#E0E0E0" width="100%">
910 Matches for http://www.example.net/adsl/HOWTO/:
912 In file: default.action <span class="GUIBUTTON">[ View ]</span> <span class=
913 "GUIBUTTON">[ Edit ]</span>
917 +change-x-forwarded-for{block}
918 -client-header-filter{hide-tor-exit-notation}
919 -content-type-overwrite
920 -crunch-client-header
921 -crunch-if-none-match
922 -crunch-incoming-cookies
923 -crunch-outgoing-cookies
924 -crunch-server-header
926 -downgrade-http-version
927 +fast-redirects {check-decoded-url}
929 -filter {content-cookies}
931 -filter {banners-by-link}
932 -filter {tiny-textforms}
933 -filter {frameset-borders}
934 -filter {demoronizer}
935 -filter {shockwave-flash}
936 -filter {quicktime-kioskmode}
938 -filter {crude-parental}
939 -filter {site-specifics}
940 -filter {js-annoyances}
941 -filter {html-annoyances}
942 +filter {refresh-tags}
943 -filter {unsolicited-popups}
944 +filter {img-reorder}
945 +filter {banners-by-size}
947 +filter {jumping-windows}
948 +filter {ie-exploits}
955 -handle-as-empty-document
957 -hide-accept-language
958 -hide-content-disposition
959 +hide-from-header{block}
962 -overwrite-last-modified
965 -server-header-filter{xml-to-html}
966 -server-header-filter{html-to-xml}
967 +session-cookies-only
968 +set-image-blocker{blank} }
971 { +block{Path contains "ads".} +handle-as-image }
979 Ooops, the <span class="QUOTE">"/adsl/"</span> is matching <span
980 class="QUOTE">"/ads"</span> in our configuration! But we did not
981 want this at all! Now we see why we get the blank page. It is
982 actually triggering two different actions here, and the effects are
983 aggregated so that the URL is blocked, and <span class=
984 "APPLICATION">Privoxy</span> is told to treat the block as if it
985 were an image. But this is, of course, all wrong. We could now add
986 a new action below this (or better in our own <tt class=
987 "FILENAME">user.action</tt> file) that explicitly <span class=
988 "emphasis"><i class="EMPHASIS">un</i></span> blocks ( <a href=
989 "actions-file.html#BLOCK"><span class=
990 "QUOTE">"{-block}"</span></a>) paths with <span class=
991 "QUOTE">"adsl"</span> in them (remember, last match in the
992 configuration wins). There are various ways to handle such
997 <table border="0" bgcolor="#E0E0E0" width="100%">
1000 <pre class="SCREEN">
1009 Now the page displays ;-) Remember to flush your browser's caches
1010 when making these kinds of changes to your configuration to insure
1011 that you get a freshly delivered page! Or, try using <tt class=
1012 "LITERAL">Shift+Reload</tt>.
1015 But now what about a situation where we get no explicit matches
1020 <table border="0" bgcolor="#E0E0E0" width="100%">
1023 <pre class="SCREEN">
1024 { +block{Path starts with "ads".} +handle-as-image }
1032 That actually was very helpful and pointed us quickly to where the
1033 problem was. If you don't get this kind of match, then it means one
1034 of the default rules in the first section of <tt class=
1035 "FILENAME">default.action</tt> is causing the problem. This would
1036 require some guesswork, and maybe a little trial and error to
1037 isolate the offending rule. One likely cause would be one of the <a
1038 href="actions-file.html#FILTER"><span class=
1039 "QUOTE">"+filter"</span></a> actions. These tend to be harder to
1040 troubleshoot. Try adding the URL for the site to one of aliases
1041 that turn off <a href="actions-file.html#FILTER"><span class=
1042 "QUOTE">"+filter"</span></a>:
1046 <table border="0" bgcolor="#E0E0E0" width="100%">
1049 <pre class="SCREEN">
1052 .worldpay.com # for quietpc.com
1062 <span class="QUOTE">"<tt class="LITERAL">{ shop }</tt>"</span> is
1063 an <span class="QUOTE">"alias"</span> that expands to <span class=
1064 "QUOTE">"<tt class="LITERAL">{ -filter -session-cookies-only
1065 }</tt>"</span>. Or you could do your own exception to negate
1070 <table border="0" bgcolor="#E0E0E0" width="100%">
1073 <pre class="SCREEN">
1075 # Disable ALL filter actions for sites in this section
1085 This would turn off all filtering for these sites. This is best put
1086 in <tt class="FILENAME">user.action</tt>, for local site
1087 exceptions. Note that when a simple domain pattern is used by
1088 itself (without the subsequent path portion), all sub-pages within
1089 that domain are included automatically in the scope of the action.
1092 Images that are inexplicably being blocked, may well be hitting the
1093 <a href="actions-file.html#FILTER-BANNERS-BY-SIZE"><span class=
1094 "QUOTE">"+filter{banners-by-size}"</span></a> rule, which assumes
1095 that images of certain sizes are ad banners (works well <span
1096 class="emphasis"><i class="EMPHASIS">most of the time</i></span>
1097 since these tend to be standardized).
1100 <span class="QUOTE">"<tt class="LITERAL">{ fragile }</tt>"</span>
1101 is an alias that disables most actions that are the most likely to
1102 cause trouble. This can be used as a last resort for problem sites.
1106 <table border="0" bgcolor="#E0E0E0" width="100%">
1109 <pre class="SCREEN">
1111 # Handle with care: easy to break
1120 <span class="emphasis"><i class="EMPHASIS">Remember to flush
1121 caches!</i></span> Note that the <tt class=
1122 "LITERAL">mail.google</tt> reference lacks the TLD portion (e.g.
1123 <span class="QUOTE">".com"</span>). This will effectively match any
1124 TLD with <tt class="LITERAL">google</tt> in it, such as <tt class=
1125 "LITERAL">mail.google.de.</tt>, just as an example.
1128 If this still does not work, you will have to go through the
1129 remaining actions one by one to find which one(s) is causing the
1134 <div class="NAVFOOTER">
1135 <hr align="LEFT" width="100%">
1136 <table summary="Footer navigation table" width="100%" border="0"
1137 cellpadding="0" cellspacing="0">
1139 <td width="33%" align="left" valign="top">
1140 <a href="seealso.html" accesskey="P">Prev</a>
1142 <td width="34%" align="center" valign="top">
1143 <a href="index.html" accesskey="H">Home</a>
1145 <td width="33%" align="right" valign="top">
1150 <td width="33%" align="left" valign="top">
1153 <td width="34%" align="center" valign="top">
1156 <td width="33%" align="right" valign="top">