1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
2 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
5 <meta name="generator" content="HTML Tidy, see www.w3.org">
9 <meta name="GENERATOR" content=
10 "Modular DocBook HTML Stylesheet Version 1.79">
11 <link rel="HOME" title="Privoxy 3.0.18 User Manual" href="index.html">
12 <link rel="PREVIOUS" title="See Also" href="seealso.html">
13 <link rel="STYLESHEET" type="text/css" href="../p_doc.css">
14 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
15 <link rel="STYLESHEET" type="text/css" href="p_doc.css">
16 <style type="text/css">
18 background-color: #EEEEEE;
21 :link { color: #0000FF }
22 :visited { color: #840084 }
23 :active { color: #0000FF }
24 hr.c1 {text-align: left}
28 <div class="NAVHEADER">
29 <table summary="Header navigation table" width="100%" border="0"
30 cellpadding="0" cellspacing="0">
32 <th colspan="3" align="center">
33 Privoxy 3.0.18 User Manual
37 <td width="10%" align="left" valign="bottom">
38 <a href="seealso.html" accesskey="P">Prev</a>
40 <td width="80%" align="center" valign="bottom">
42 <td width="10%" align="right" valign="bottom">
47 <hr width="100%" class="c1">
51 <a name="APPENDIX">14. Appendix</a>
55 <a name="REGEX">14.1. Regular Expressions</a>
58 <span class="APPLICATION">Privoxy</span> uses Perl-style <span
59 class="QUOTE">"regular expressions"</span> in its <a href=
60 "actions-file.html">actions files</a> and <a href=
61 "filter-file.html">filter file</a>, through the <a href=
62 "http://www.pcre.org/" target="_top">PCRE</a> and <span class=
63 "APPLICATION">PCRS</span> libraries.
66 If you are reading this, you probably don't understand what <span
67 class="QUOTE">"regular expressions"</span> are, or what they can
68 do. So this will be a very brief introduction only. A full
69 explanation would require a <a href=
70 "http://www.oreilly.com/catalog/regex/" target="_top">book</a> ;-)
73 Regular expressions provide a language to describe patterns that
74 can be run against strings of characters (letter, numbers, etc), to
75 see if they match the string or not. The patterns are themselves
76 (sometimes complex) strings of literal characters, combined with
77 wild-cards, and other special characters, called meta-characters.
78 The <span class="QUOTE">"meta-characters"</span> have special
79 meanings and are used to build complex patterns to be matched
80 against. Perl Compatible Regular Expressions are an especially
81 convenient <span class="QUOTE">"dialect"</span> of the regular
85 To make a simple analogy, we do something similar when we use
86 wild-card characters when listing files with the <b class=
87 "COMMAND">dir</b> command in DOS. <tt class="LITERAL">*.*</tt>
88 matches all filenames. The <span class="QUOTE">"special"</span>
89 character here is the asterisk which matches any and all
90 characters. We can be more specific and use <tt class=
91 "LITERAL">?</tt> to match just individual characters. So <span
92 class="QUOTE">"dir file?.text"</span> would match <span class=
93 "QUOTE">"file1.txt"</span>, <span class="QUOTE">"file2.txt"</span>,
94 etc. We are pattern matching, using a similar technique to <span
95 class="QUOTE">"regular expressions"</span>!
98 Regular expressions do essentially the same thing, but are much,
99 much more powerful. There are many more <span class=
100 "QUOTE">"special characters"</span> and ways of building complex
101 patterns however. Let's look at a few of the common ones, and then
108 <span class="emphasis"><i class="EMPHASIS">.</i></span> -
109 Matches any single character, e.g. <span class=
110 "QUOTE">"a"</span>, <span class="QUOTE">"A"</span>, <span
111 class="QUOTE">"4"</span>, <span class="QUOTE">":"</span>, or
112 <span class="QUOTE">"@"</span>.
122 <span class="emphasis"><i class="EMPHASIS">?</i></span> - The
123 preceding character or expression is matched ZERO or ONE
134 <span class="emphasis"><i class="EMPHASIS">+</i></span> - The
135 preceding character or expression is matched ONE or MORE
146 <span class="emphasis"><i class="EMPHASIS">*</i></span> - The
147 preceding character or expression is matched ZERO or MORE
158 <span class="emphasis"><i class="EMPHASIS">\</i></span> - The
159 <span class="QUOTE">"escape"</span> character denotes that
160 the following character should be taken literally. This is
161 used where one of the special characters (e.g. <span class=
162 "QUOTE">"."</span>) needs to be taken literally and not as a
163 special meta-character. Example: <span class=
164 "QUOTE">"example\.com"</span>, makes sure the period is
165 recognized only as a period (and not expanded to its
166 meta-character meaning of any single character).
176 <span class="emphasis"><i class="EMPHASIS">[ ]</i></span> -
177 Characters enclosed in brackets will be matched if any of the
178 enclosed characters are encountered. For instance, <span
179 class="QUOTE">"[0-9]"</span> matches any numeric digit (zero
180 through nine). As an example, we can combine this with <span
181 class="QUOTE">"+"</span> to match any digit one of more
182 times: <span class="QUOTE">"[0-9]+"</span>.
192 <span class="emphasis"><i class="EMPHASIS">( )</i></span> -
193 parentheses are used to group a sub-expression, or multiple
204 <span class="emphasis"><i class="EMPHASIS">|</i></span> - The
205 <span class="QUOTE">"bar"</span> character works like an
206 <span class="QUOTE">"or"</span> conditional statement. A
207 match is successful if the sub-expression on either side of
208 <span class="QUOTE">"|"</span> matches. As an example: <span
209 class="QUOTE">"/(this|that) example/"</span> uses grouping
210 and the bar character and would match either <span class=
211 "QUOTE">"this example"</span> or <span class="QUOTE">"that
212 example"</span>, and nothing else.
219 These are just some of the ones you are likely to use when matching
220 URLs with <span class="APPLICATION">Privoxy</span>, and is a long
221 way from a definitive list. This is enough to get us started with a
222 few simple examples which may be more illuminating:
225 <span class="emphasis"><i class="EMPHASIS"><tt class=
226 "LITERAL">/.*/banners/.*</tt></i></span> - A simple example that
227 uses the common combination of <span class="QUOTE">"."</span> and
228 <span class="QUOTE">"*"</span> to denote any character, zero or
229 more times. In other words, any string at all. So we start with a
230 literal forward slash, then our regular expression pattern (<span
231 class="QUOTE">".*"</span>) another literal forward slash, the
232 string <span class="QUOTE">"banners"</span>, another forward slash,
233 and lastly another <span class="QUOTE">".*"</span>. We are building
234 a directory path here. This will match any file with the path that
235 has a directory named <span class="QUOTE">"banners"</span> in it.
236 The <span class="QUOTE">".*"</span> matches any characters, and
237 this could conceivably be more forward slashes, so it might expand
238 into a much longer looking path. For example, this could match:
240 "QUOTE">"/eye/hate/spammers/banners/annoy_me_please.gif"</span>, or
241 just <span class="QUOTE">"/banners/annoying.html"</span>, or almost
242 an infinite number of other possible combinations, just so it has
243 <span class="QUOTE">"banners"</span> in the path somewhere.
246 And now something a little more complex:
249 <span class="emphasis"><i class="EMPHASIS"><tt class=
250 "LITERAL">/.*/adv((er)?ts?|ertis(ing|ements?))?/</tt></i></span> -
251 We have several literal forward slashes again (<span class=
252 "QUOTE">"/"</span>), so we are building another expression that is
253 a file path statement. We have another <span class=
254 "QUOTE">".*"</span>, so we are matching against any conceivable
255 sub-path, just so it matches our expression. The only true literal
256 that <span class="emphasis"><i class="EMPHASIS">must
257 match</i></span> our pattern is <span class=
258 "APPLICATION">adv</span>, together with the forward slashes. What
259 comes after the <span class="QUOTE">"adv"</span> string is the
263 Remember the <span class="QUOTE">"?"</span> means the preceding
264 expression (either a literal character or anything grouped with
265 <span class="QUOTE">"(...)"</span> in this case) can exist or not,
266 since this means either zero or one match. So <span class=
267 "QUOTE">"((er)?ts?|ertis(ing|ements?))"</span> is optional, as are
268 the individual sub-expressions: <span class="QUOTE">"(er)"</span>,
269 <span class="QUOTE">"(ing|ements?)"</span>, and the <span class=
270 "QUOTE">"s"</span>. The <span class="QUOTE">"|"</span> means <span
271 class="QUOTE">"or"</span>. We have two of those. For instance,
272 <span class="QUOTE">"(ing|ements?)"</span>, can expand to match
273 either <span class="QUOTE">"ing"</span> <span class="emphasis"><i
274 class="EMPHASIS">OR</i></span> <span class=
275 "QUOTE">"ements?"</span>. What is being done here, is an attempt at
276 matching as many variations of <span class=
277 "QUOTE">"advertisement"</span>, and similar, as possible. So this
278 would expand to match just <span class="QUOTE">"adv"</span>, or
279 <span class="QUOTE">"advert"</span>, or <span class=
280 "QUOTE">"adverts"</span>, or <span class=
281 "QUOTE">"advertising"</span>, or <span class=
282 "QUOTE">"advertisement"</span>, or <span class=
283 "QUOTE">"advertisements"</span>. You get the idea. But it would not
284 match <span class="QUOTE">"advertizements"</span> (with a <span
285 class="QUOTE">"z"</span>). We could fix that by changing our
286 regular expression to: <span class=
287 "QUOTE">"/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/"</span>, which
288 would then match either spelling.
291 <span class="emphasis"><i class="EMPHASIS"><tt class=
292 "LITERAL">/.*/advert[0-9]+\.(gif|jpe?g)</tt></i></span> - Again
293 another path statement with forward slashes. Anything in the square
294 brackets <span class="QUOTE">"[ ]"</span> can be matched. This is
295 using <span class="QUOTE">"0-9"</span> as a shorthand expression to
296 mean any digit one through nine. It is the same as saying <span
297 class="QUOTE">"0123456789"</span>. So any digit matches. The <span
298 class="QUOTE">"+"</span> means one or more of the preceding
299 expression must be included. The preceding expression here is what
300 is in the square brackets -- in this case, any digit one through
301 nine. Then, at the end, we have a grouping: <span class=
302 "QUOTE">"(gif|jpe?g)"</span>. This includes a <span class=
303 "QUOTE">"|"</span>, so this needs to match the expression on either
304 side of that bar character also. A simple <span class=
305 "QUOTE">"gif"</span> on one side, and the other side will in turn
306 match either <span class="QUOTE">"jpeg"</span> or <span class=
307 "QUOTE">"jpg"</span>, since the <span class="QUOTE">"?"</span>
308 means the letter <span class="QUOTE">"e"</span> is optional and can
309 be matched once or not at all. So we are building an expression
310 here to match image GIF or JPEG type image file. It must include
311 the literal string <span class="QUOTE">"advert"</span>, then one or
312 more digits, and a <span class="QUOTE">"."</span> (which is now a
313 literal, and not a special character, since it is escaped with
314 <span class="QUOTE">"\"</span>), and lastly either <span class=
315 "QUOTE">"gif"</span>, or <span class="QUOTE">"jpeg"</span>, or
316 <span class="QUOTE">"jpg"</span>. Some possible matches would
317 include: <span class="QUOTE">"//advert1.jpg"</span>, <span class=
318 "QUOTE">"/nasty/ads/advert1234.gif"</span>, <span class=
319 "QUOTE">"/banners/from/hell/advert99.jpg"</span>. It would not
320 match <span class="QUOTE">"advert1.gif"</span> (no leading slash),
321 or <span class="QUOTE">"/adverts232.jpg"</span> (the expression
322 does not include an <span class="QUOTE">"s"</span>), or <span
323 class="QUOTE">"/advert1.jsp"</span> (<span class=
324 "QUOTE">"jsp"</span> is not in the expression anywhere).
327 We are barely scratching the surface of regular expressions here so
328 that you can understand the default <span class=
329 "APPLICATION">Privoxy</span> configuration files, and maybe use
330 this knowledge to customize your own installation. There is much,
331 much more that can be done with regular expressions. Now that you
332 know enough to get started, you can learn more on your own :/
335 More reading on Perl Compatible Regular expressions: <a href=
336 "http://perldoc.perl.org/perlre.html" target=
337 "_top">http://perldoc.perl.org/perlre.html</a>
340 For information on regular expression based substitutions and their
341 applications in filters, please see the <a href=
342 "filter-file.html">filter file tutorial</a> in this manual.
347 <a name="AEN5636">14.2. Privoxy's Internal Pages</a>
350 Since <span class="APPLICATION">Privoxy</span> proxies each
351 requested web page, it is easy for <span class=
352 "APPLICATION">Privoxy</span> to trap certain special URLs. In this
353 way, we can talk directly to <span class=
354 "APPLICATION">Privoxy</span>, and see how it is configured, see how
355 our rules are being applied, change these rules and other
356 configuration options, and even turn <span class=
357 "APPLICATION">Privoxy's</span> filtering off, all with a web
361 The URLs listed below are the special ones that allow direct access
362 to <span class="APPLICATION">Privoxy</span>. Of course, <span
363 class="APPLICATION">Privoxy</span> must be running to access these.
364 If not, you will get a friendly error message. Internet access is
365 not necessary either.
374 <a name="AEN5650"></a>
375 <blockquote class="BLOCKQUOTE">
377 <a href="http://config.privoxy.org/" target=
378 "_top">http://config.privoxy.org/</a>
382 There is a shortcut: <a href="http://p.p/" target=
383 "_top">http://p.p/</a> (But it doesn't provide a fall-back to a
384 real page, in case the request is not sent through <span class=
385 "APPLICATION">Privoxy</span>)
390 Show information about the current configuration, including
391 viewing and editing of actions files:
393 <a name="AEN5658"></a>
394 <blockquote class="BLOCKQUOTE">
396 <a href="http://config.privoxy.org/show-status" target=
397 "_top">http://config.privoxy.org/show-status</a>
403 Show the source code version numbers:
405 <a name="AEN5663"></a>
406 <blockquote class="BLOCKQUOTE">
408 <a href="http://config.privoxy.org/show-version" target=
409 "_top">http://config.privoxy.org/show-version</a>
415 Show the browser's request headers:
417 <a name="AEN5668"></a>
418 <blockquote class="BLOCKQUOTE">
420 <a href="http://config.privoxy.org/show-request" target=
421 "_top">http://config.privoxy.org/show-request</a>
427 Show which actions apply to a URL and why:
429 <a name="AEN5673"></a>
430 <blockquote class="BLOCKQUOTE">
432 <a href="http://config.privoxy.org/show-url-info" target=
433 "_top">http://config.privoxy.org/show-url-info</a>
439 Toggle Privoxy on or off. This feature can be turned off/on in
440 the main <tt class="FILENAME">config</tt> file. When toggled
441 <span class="QUOTE">"off"</span>, <span class=
442 "QUOTE">"Privoxy"</span> continues to run, but only as a
443 pass-through proxy, with no actions taking place:
445 <a name="AEN5681"></a>
446 <blockquote class="BLOCKQUOTE">
448 <a href="http://config.privoxy.org/toggle" target=
449 "_top">http://config.privoxy.org/toggle</a>
453 Short cuts. Turn off, then on:
455 <a name="AEN5685"></a>
456 <blockquote class="BLOCKQUOTE">
458 <a href="http://config.privoxy.org/toggle?set=disable"
460 "_top">http://config.privoxy.org/toggle?set=disable</a>
463 <a name="AEN5688"></a>
464 <blockquote class="BLOCKQUOTE">
466 <a href="http://config.privoxy.org/toggle?set=enable" target=
467 "_top">http://config.privoxy.org/toggle?set=enable</a>
474 These may be bookmarked for quick reference. See next.
478 <a name="BOOKMARKLETS">14.2.1. Bookmarklets</a>
481 Below are some <span class="QUOTE">"bookmarklets"</span> to allow
482 you to easily access a <span class="QUOTE">"mini"</span> version
483 of some of <span class="APPLICATION">Privoxy's</span> special
484 pages. They are designed for MS Internet Explorer, but should
485 work equally well in Netscape, Mozilla, and other browsers which
486 support JavaScript. They are designed to run directly from your
487 bookmarks - not by clicking the links below (although that should
491 To save them, right-click the link and choose <span class=
492 "QUOTE">"Add to Favorites"</span> (IE) or <span class=
493 "QUOTE">"Add Bookmark"</span> (Netscape). You will get a warning
494 that the bookmark <span class="QUOTE">"may not be safe"</span> -
495 just click OK. Then you can run the Bookmarklet directly from
496 your favorites/bookmarks. For even faster access, you can put
497 them on the <span class="QUOTE">"Links"</span> bar (IE) or the
498 <span class="QUOTE">"Personal Toolbar"</span> (Netscape), and run
499 them with a single click.
507 "javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=enabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
508 target="_top">Privoxy - Enable</a>
514 "javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=disabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
515 target="_top">Privoxy - Disable</a>
521 "javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=toggle','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
522 target="_top">Privoxy - Toggle Privoxy</a> (Toggles between
523 enabled and disabled)
529 "javascript:void(window.open('http://config.privoxy.org/toggle?mini=y','ijbstatus','width=250,height=2,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
530 target="_top">Privoxy- View Status</a>
536 "javascript:void(window.open('http://config.privoxy.org/show-url-info?url='+escape(location.href),'Why').focus());"
537 target="_top">Privoxy - Why?</a>
543 Credit: The site which gave us the general idea for these
544 bookmarklets is <a href="http://www.bookmarklets.com/" target=
545 "_top">www.bookmarklets.com</a>. They have more information about
552 <a name="CHAIN">14.3. Chain of Events</a>
555 Let's take a quick look at how some of <span class=
556 "APPLICATION">Privoxy's</span> core features are triggered, and the
557 ensuing sequence of events when a web page is requested by your
565 First, your web browser requests a web page. The browser knows
566 to send the request to <span class=
567 "APPLICATION">Privoxy</span>, which will in turn, relay the
568 request to the remote web server after passing the following
574 <span class="APPLICATION">Privoxy</span> traps any request for
575 its own internal CGI pages (e.g <a href="http://p.p/" target=
576 "_top">http://p.p/</a>) and sends the CGI page back to the
582 Next, <span class="APPLICATION">Privoxy</span> checks to see if
583 the URL matches any <a href="actions-file.html#BLOCK"><span
584 class="QUOTE">"+block"</span></a> patterns. If so, the URL is
585 then blocked, and the remote web server will not be contacted.
586 <a href="actions-file.html#HANDLE-AS-IMAGE"><span class=
587 "QUOTE">"+handle-as-image"</span></a> and <a href=
588 "actions-file.html#HANDLE-AS-EMPTY-DOCUMENT"><span class=
589 "QUOTE">"+handle-as-empty-document"</span></a> are then
590 checked, and if there is no match, an HTML <span class=
591 "QUOTE">"BLOCKED"</span> page is sent back to the browser.
592 Otherwise, if it does match, an image is returned for the
593 former, and an empty text document for the latter. The type of
594 image would depend on the setting of <a href=
595 "actions-file.html#SET-IMAGE-BLOCKER"><span class=
596 "QUOTE">"+set-image-blocker"</span></a> (blank, checkerboard
597 pattern, or an HTTP redirect to an image elsewhere).
602 Untrusted URLs are blocked. If URLs are being added to the <tt
603 class="FILENAME">trust</tt> file, then that is done.
608 If the URL pattern matches the <a href=
609 "actions-file.html#FAST-REDIRECTS"><span class=
610 "QUOTE">"+fast-redirects"</span></a> action, it is then
611 processed. Unwanted parts of the requested URL are stripped.
616 Now the rest of the client browser's request headers are
617 processed. If any of these match any of the relevant actions
618 (e.g. <a href="actions-file.html#HIDE-USER-AGENT"><span class=
619 "QUOTE">"+hide-user-agent"</span></a>, etc.), headers are
620 suppressed or forged as determined by these actions and their
626 Now the web server starts sending its response back (i.e.
627 typically a web page).
632 First, the server headers are read and processed to determine,
633 among other things, the MIME type (document type) and encoding.
634 The headers are then filtered as determined by the <a href=
635 "actions-file.html#CRUNCH-INCOMING-COOKIES"><span class=
636 "QUOTE">"+crunch-incoming-cookies"</span></a>, <a href=
637 "actions-file.html#SESSION-COOKIES-ONLY"><span class=
638 "QUOTE">"+session-cookies-only"</span></a>, and <a href=
639 "actions-file.html#DOWNGRADE-HTTP-VERSION"><span class=
640 "QUOTE">"+downgrade-http-version"</span></a> actions.
645 If any <a href="actions-file.html#FILTER"><span class=
646 "QUOTE">"+filter"</span></a> action or <a href=
647 "actions-file.html#DEANIMATE-GIFS"><span class=
648 "QUOTE">"+deanimate-gifs"</span></a> action applies (and the
649 document type fits the action), the rest of the page is read
650 into memory (up to a configurable limit). Then the filter rules
651 (from <tt class="FILENAME">default.filter</tt> and any other
652 filter files) are processed against the buffered content.
653 Filters are applied in the order they are specified in one of
654 the filter files. Animated GIFs, if present, are reduced to
655 either the first or last frame, depending on the action
656 setting.The entire page, which is now filtered, is then sent by
657 <span class="APPLICATION">Privoxy</span> back to your browser.
660 If neither a <a href="actions-file.html#FILTER"><span class=
661 "QUOTE">"+filter"</span></a> action or <a href=
662 "actions-file.html#DEANIMATE-GIFS"><span class=
663 "QUOTE">"+deanimate-gifs"</span></a> matches, then <span class=
664 "APPLICATION">Privoxy</span> passes the raw data through to the
665 client browser as it becomes available.
670 As the browser receives the now (possibly filtered) page
671 content, it reads and then requests any URLs that may be
672 embedded within the page source, e.g. ad images, stylesheets,
673 JavaScript, other HTML documents (e.g. frames), sounds, etc.
674 For each of these objects, the browser issues a separate
675 request (this is easily viewable in <span class=
676 "APPLICATION">Privoxy's</span> logs). And each such request is
677 in turn processed just as above. Note that a complex web page
678 will have many, many such embedded URLs. If these secondary
679 requests are to a different server, then quite possibly a very
680 differing set of actions is triggered.
686 NOTE: This is somewhat of a simplistic overview of what happens
687 with each URL request. For the sake of brevity and simplicity, we
688 have focused on <span class="APPLICATION">Privoxy's</span> core
694 <a name="ACTIONSANAT">14.4. Troubleshooting: Anatomy of an
698 The way <span class="APPLICATION">Privoxy</span> applies <a href=
699 "actions-file.html#ACTIONS">actions</a> and <a href=
700 "actions-file.html#FILTER">filters</a> to any given URL can be
701 complex, and not always so easy to understand what is happening.
702 And sometimes we need to be able to <span class="emphasis"><i
703 class="EMPHASIS">see</i></span> just what <span class=
704 "APPLICATION">Privoxy</span> is doing. Especially, if something
705 <span class="APPLICATION">Privoxy</span> is doing is causing us a
706 problem inadvertently. It can be a little daunting to look at the
707 actions and filters files themselves, since they tend to be filled
708 with <a href="appendix.html#REGEX">regular expressions</a> whose
709 consequences are not always so obvious.
712 One quick test to see if <span class="APPLICATION">Privoxy</span>
713 is causing a problem or not, is to disable it temporarily. This
714 should be the first troubleshooting step. See <a href=
715 "appendix.html#BOOKMARKLETS">the Bookmarklets</a> section on a
716 quick and easy way to do this (be sure to flush caches afterward!).
717 Looking at the logs is a good idea too. (Note that both the toggle
718 feature and logging are enabled via <tt class=
719 "FILENAME">config</tt> file settings, and may need to be turned
720 <span class="QUOTE">"on"</span>.)
723 Another easy troubleshooting step to try is if you have done any
724 customization of your installation, revert back to the installed
725 defaults and see if that helps. There are times the developers get
726 complaints about one thing or another, and the problem is more
727 related to a customized configuration issue.
730 <span class="APPLICATION">Privoxy</span> also provides the <a href=
731 "http://config.privoxy.org/show-url-info" target=
732 "_top">http://config.privoxy.org/show-url-info</a> page that can
733 show us very specifically how <span class=
734 "APPLICATION">actions</span> are being applied to any given URL.
735 This is a big help for troubleshooting.
738 First, enter one URL (or partial URL) at the prompt, and then <span
739 class="APPLICATION">Privoxy</span> will tell us how the current
740 configuration will handle it. This will not help with filtering
741 effects (i.e. the <a href="actions-file.html#FILTER"><span class=
742 "QUOTE">"+filter"</span></a> action) from one of the filter files
743 since this is handled very differently and not so easy to trap! It
744 also will not tell you about any other URLs that may be embedded
745 within the URL you are testing. For instance, images such as ads
746 are expressed as URLs within the raw page source of HTML pages. So
747 you will only get info for the actual URL that is pasted into the
748 prompt area -- not any sub-URLs. If you want to know about embedded
749 URLs like ads, you will have to dig those out of the HTML source.
750 Use your browser's <span class="QUOTE">"View Page Source"</span>
751 option for this. Or right click on the ad, and grab the URL.
754 Let's try an example, <a href="http://google.com" target=
755 "_top">google.com</a>, and look at it one section at a time in a
756 sample configuration (your real configuration may vary):
760 <table border="0" bgcolor="#E0E0E0" width="100%">
764 Matches for http://www.google.com:
766 In file: default.action <span class="GUIBUTTON">[ View ]</span> <span class=
767 "GUIBUTTON">[ Edit ]</span>
769 {+change-x-forwarded-for{block}
770 +deanimate-gifs {last}
771 +fast-redirects {check-decoded-url}
772 +filter {refresh-tags}
773 +filter {img-reorder}
774 +filter {banners-by-size}
776 +filter {jumping-windows}
777 +filter {ie-exploits}
778 +hide-from-header {block}
779 +hide-referrer {forge}
780 +session-cookies-only
781 +set-image-blocker {pattern}
784 { -session-cookies-only }
790 In file: user.action <span class="GUIBUTTON">[ View ]</span> <span class=
791 "GUIBUTTON">[ Edit ]</span>
792 (no matches in this file)
799 This is telling us how we have defined our <a href=
800 "actions-file.html#ACTIONS"><span class=
801 "QUOTE">"actions"</span></a>, and which ones match for our test
802 case, <span class="QUOTE">"google.com"</span>. Displayed is all the
803 actions that are available to us. Remember, the <tt class=
804 "LITERAL">+</tt> sign denotes <span class="QUOTE">"on"</span>. <tt
805 class="LITERAL">-</tt> denotes <span class="QUOTE">"off"</span>. So
806 some are <span class="QUOTE">"on"</span> here, but many are <span
807 class="QUOTE">"off"</span>. Each example we try may provide a
808 slightly different end result, depending on our configuration
812 The first listing is for our <tt class=
813 "FILENAME">default.action</tt> file. The large, multi-line listing,
814 is how the actions are set to match for all URLs, i.e. our default
815 settings. If you look at your <span class="QUOTE">"actions"</span>
816 file, this would be the section just below the <span class=
817 "QUOTE">"aliases"</span> section near the top. This will apply to
818 all URLs as signified by the single forward slash at the end of the
819 listing -- <span class="QUOTE">" / "</span>.
822 But we have defined additional actions that would be exceptions to
823 these general rules, and then we list specific URLs (or patterns)
824 that these exceptions would apply to. Last match wins. Just below
825 this then are two explicit matches for <span class=
826 "QUOTE">".google.com"</span>. The first is negating our previous
827 cookie setting, which was for <a href=
828 "actions-file.html#SESSION-COOKIES-ONLY"><span class=
829 "QUOTE">"+session-cookies-only"</span></a> (i.e. not persistent).
830 So we will allow persistent cookies for google, at least that is
831 how it is in this example. The second turns <span class=
832 "emphasis"><i class="EMPHASIS">off</i></span> any <a href=
833 "actions-file.html#FAST-REDIRECTS"><span class=
834 "QUOTE">"+fast-redirects"</span></a> action, allowing this to take
835 place unmolested. Note that there is a leading dot here -- <span
836 class="QUOTE">".google.com"</span>. This will match any hosts and
837 sub-domains, in the google.com domain also, such as <span class=
838 "QUOTE">"www.google.com"</span> or <span class=
839 "QUOTE">"mail.google.com"</span>. But it would not match <span
840 class="QUOTE">"www.google.de"</span>! So, apparently, we have these
841 two actions defined as exceptions to the general rules at the top
842 somewhere in the lower part of our <tt class=
843 "FILENAME">default.action</tt> file, and <span class=
844 "QUOTE">"google.com"</span> is referenced somewhere in these latter
848 Then, for our <tt class="FILENAME">user.action</tt> file, we again
849 have no hits. So there is nothing google-specific that we might
850 have added to our own, local configuration. If there was, those
851 actions would over-rule any actions from previously processed
852 files, such as <tt class="FILENAME">default.action</tt>. <tt class=
853 "FILENAME">user.action</tt> typically has the last word. This is
854 the best place to put hard and fast exceptions,
857 And finally we pull it all together in the bottom section and
858 summarize how <span class="APPLICATION">Privoxy</span> is applying
859 all its <span class="QUOTE">"actions"</span> to <span class=
860 "QUOTE">"google.com"</span>:
864 <table border="0" bgcolor="#E0E0E0" width="100%">
872 +change-x-forwarded-for{block}
873 -client-header-filter{hide-tor-exit-notation}
874 -content-type-overwrite
875 -crunch-client-header
876 -crunch-if-none-match
877 -crunch-incoming-cookies
878 -crunch-outgoing-cookies
879 -crunch-server-header
880 +deanimate-gifs {last}
881 -downgrade-http-version
884 -filter {content-cookies}
886 -filter {banners-by-link}
887 -filter {tiny-textforms}
888 -filter {frameset-borders}
889 -filter {demoronizer}
890 -filter {shockwave-flash}
891 -filter {quicktime-kioskmode}
893 -filter {crude-parental}
894 -filter {site-specifics}
895 -filter {js-annoyances}
896 -filter {html-annoyances}
897 +filter {refresh-tags}
898 -filter {unsolicited-popups}
899 +filter {img-reorder}
900 +filter {banners-by-size}
902 +filter {jumping-windows}
903 +filter {ie-exploits}
910 -handle-as-empty-document
912 -hide-accept-language
913 -hide-content-disposition
914 +hide-from-header {block}
915 -hide-if-modified-since
916 +hide-referrer {forge}
919 -overwrite-last-modified
922 -server-header-filter{xml-to-html}
923 -server-header-filter{html-to-xml}
924 -session-cookies-only
925 +set-image-blocker {pattern}
932 Notice the only difference here to the previous listing, is to
933 <span class="QUOTE">"fast-redirects"</span> and <span class=
934 "QUOTE">"session-cookies-only"</span>, which are activated
935 specifically for this site in our configuration, and thus show in
936 the <span class="QUOTE">"Final Results"</span>.
939 Now another example, <span class=
940 "QUOTE">"ad.doubleclick.net"</span>:
944 <table border="0" bgcolor="#E0E0E0" width="100%">
948 { +block{Domains starts with "ad"} }
951 { +block{Domain contains "ad"} }
954 { +block{Doubleclick banner server} +handle-as-image }
955 .[a-vx-z]*.doubleclick.net
962 We'll just show the interesting part here - the explicit matches.
963 It is matched three different times. Two <span class=
964 "QUOTE">"+block{}"</span> sections, and a <span class=
965 "QUOTE">"+block{} +handle-as-image"</span>, which is the expanded
966 form of one of our aliases that had been defined as: <span class=
967 "QUOTE">"+block-as-image"</span>. (<a href=
968 "actions-file.html#ALIASES"><span class=
969 "QUOTE">"Aliases"</span></a> are defined in the first section of
970 the actions file and typically used to combine more than one
974 Any one of these would have done the trick and blocked this as an
975 unwanted image. This is unnecessarily redundant since the last case
976 effectively would also cover the first. No point in taking chances
977 with these guys though ;-) Note that if you want an ad or obnoxious
978 URL to be invisible, it should be defined as <span class=
979 "QUOTE">"ad.doubleclick.net"</span> is done here -- as both a <a
980 href="actions-file.html#BLOCK"><span class=
981 "QUOTE">"+block{}"</span></a> <span class="emphasis"><i class=
982 "EMPHASIS">and</i></span> an <a href=
983 "actions-file.html#HANDLE-AS-IMAGE"><span class=
984 "QUOTE">"+handle-as-image"</span></a>. The custom alias <span
985 class="QUOTE">"<tt class="LITERAL">+block-as-image</tt>"</span>
986 just simplifies the process and make it more readable.
989 One last example. Let's try <span class=
990 "QUOTE">"http://www.example.net/adsl/HOWTO/"</span>. This one is
991 giving us problems. We are getting a blank page. Hmmm ...
995 <table border="0" bgcolor="#E0E0E0" width="100%">
999 Matches for http://www.example.net/adsl/HOWTO/:
1001 In file: default.action <span class="GUIBUTTON">[ View ]</span> <span class=
1002 "GUIBUTTON">[ Edit ]</span>
1006 +change-x-forwarded-for{block}
1007 -client-header-filter{hide-tor-exit-notation}
1008 -content-type-overwrite
1009 -crunch-client-header
1010 -crunch-if-none-match
1011 -crunch-incoming-cookies
1012 -crunch-outgoing-cookies
1013 -crunch-server-header
1015 -downgrade-http-version
1016 +fast-redirects {check-decoded-url}
1018 -filter {content-cookies}
1019 -filter {all-popups}
1020 -filter {banners-by-link}
1021 -filter {tiny-textforms}
1022 -filter {frameset-borders}
1023 -filter {demoronizer}
1024 -filter {shockwave-flash}
1025 -filter {quicktime-kioskmode}
1027 -filter {crude-parental}
1028 -filter {site-specifics}
1029 -filter {js-annoyances}
1030 -filter {html-annoyances}
1031 +filter {refresh-tags}
1032 -filter {unsolicited-popups}
1033 +filter {img-reorder}
1034 +filter {banners-by-size}
1036 +filter {jumping-windows}
1037 +filter {ie-exploits}
1044 -handle-as-empty-document
1046 -hide-accept-language
1047 -hide-content-disposition
1048 +hide-from-header{block}
1049 +hide-referer{forge}
1051 -overwrite-last-modified
1052 +prevent-compression
1054 -server-header-filter{xml-to-html}
1055 -server-header-filter{html-to-xml}
1056 +session-cookies-only
1057 +set-image-blocker{blank} }
1060 { +block{Path contains "ads".} +handle-as-image }
1068 Ooops, the <span class="QUOTE">"/adsl/"</span> is matching <span
1069 class="QUOTE">"/ads"</span> in our configuration! But we did not
1070 want this at all! Now we see why we get the blank page. It is
1071 actually triggering two different actions here, and the effects are
1072 aggregated so that the URL is blocked, and <span class=
1073 "APPLICATION">Privoxy</span> is told to treat the block as if it
1074 were an image. But this is, of course, all wrong. We could now add
1075 a new action below this (or better in our own <tt class=
1076 "FILENAME">user.action</tt> file) that explicitly <span class=
1077 "emphasis"><i class="EMPHASIS">un</i></span> blocks ( <a href=
1078 "actions-file.html#BLOCK"><span class=
1079 "QUOTE">"{-block}"</span></a>) paths with <span class=
1080 "QUOTE">"adsl"</span> in them (remember, last match in the
1081 configuration wins). There are various ways to handle such
1082 exceptions. Example:
1086 <table border="0" bgcolor="#E0E0E0" width="100%">
1089 <pre class="SCREEN">
1098 Now the page displays ;-) Remember to flush your browser's caches
1099 when making these kinds of changes to your configuration to insure
1100 that you get a freshly delivered page! Or, try using <tt class=
1101 "LITERAL">Shift+Reload</tt>.
1104 But now what about a situation where we get no explicit matches
1109 <table border="0" bgcolor="#E0E0E0" width="100%">
1112 <pre class="SCREEN">
1113 { +block{Path starts with "ads".} +handle-as-image }
1121 That actually was very helpful and pointed us quickly to where the
1122 problem was. If you don't get this kind of match, then it means one
1123 of the default rules in the first section of <tt class=
1124 "FILENAME">default.action</tt> is causing the problem. This would
1125 require some guesswork, and maybe a little trial and error to
1126 isolate the offending rule. One likely cause would be one of the <a
1127 href="actions-file.html#FILTER"><span class=
1128 "QUOTE">"+filter"</span></a> actions. These tend to be harder to
1129 troubleshoot. Try adding the URL for the site to one of aliases
1130 that turn off <a href="actions-file.html#FILTER"><span class=
1131 "QUOTE">"+filter"</span></a>:
1135 <table border="0" bgcolor="#E0E0E0" width="100%">
1138 <pre class="SCREEN">
1141 .worldpay.com # for quietpc.com
1151 <span class="QUOTE">"<tt class="LITERAL">{ shop }</tt>"</span> is
1152 an <span class="QUOTE">"alias"</span> that expands to <span class=
1153 "QUOTE">"<tt class="LITERAL">{ -filter -session-cookies-only
1154 }</tt>"</span>. Or you could do your own exception to negate
1159 <table border="0" bgcolor="#E0E0E0" width="100%">
1162 <pre class="SCREEN">
1164 # Disable ALL filter actions for sites in this section
1174 This would turn off all filtering for these sites. This is best put
1175 in <tt class="FILENAME">user.action</tt>, for local site
1176 exceptions. Note that when a simple domain pattern is used by
1177 itself (without the subsequent path portion), all sub-pages within
1178 that domain are included automatically in the scope of the action.
1181 Images that are inexplicably being blocked, may well be hitting the
1182 <a href="actions-file.html#FILTER-BANNERS-BY-SIZE"><span class=
1183 "QUOTE">"+filter{banners-by-size}"</span></a> rule, which assumes
1184 that images of certain sizes are ad banners (works well <span
1185 class="emphasis"><i class="EMPHASIS">most of the time</i></span>
1186 since these tend to be standardized).
1189 <span class="QUOTE">"<tt class="LITERAL">{ fragile }</tt>"</span>
1190 is an alias that disables most actions that are the most likely to
1191 cause trouble. This can be used as a last resort for problem sites.
1195 <table border="0" bgcolor="#E0E0E0" width="100%">
1198 <pre class="SCREEN">
1200 # Handle with care: easy to break
1209 <span class="emphasis"><i class="EMPHASIS">Remember to flush
1210 caches!</i></span> Note that the <tt class=
1211 "LITERAL">mail.google</tt> reference lacks the TLD portion (e.g.
1212 <span class="QUOTE">".com"</span>). This will effectively match any
1213 TLD with <tt class="LITERAL">google</tt> in it, such as <tt class=
1214 "LITERAL">mail.google.de.</tt>, just as an example.
1217 If this still does not work, you will have to go through the
1218 remaining actions one by one to find which one(s) is causing the
1223 <div class="NAVFOOTER">
1224 <hr width="100%" class="c1">
1225 <table summary="Footer navigation table" width="100%" border="0"
1226 cellpadding="0" cellspacing="0">
1228 <td width="33%" align="left" valign="top">
1229 <a href="seealso.html" accesskey="P">Prev</a>
1231 <td width="34%" align="center" valign="top">
1232 <a href="index.html" accesskey="H">Home</a>
1234 <td width="33%" align="right" valign="top">
1239 <td width="33%" align="left" valign="top">
1242 <td width="34%" align="center" valign="top">
1245 <td width="33%" align="right" valign="top">