>Appendix</TITLE
><META
NAME="GENERATOR"
-CONTENT="Modular DocBook HTML Stylesheet Version 1.64
+CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
-TITLE="Privoxy User Manual"
+TITLE="Privoxy 3.0.4 User Manual"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="See Also"
><DIV
CLASS="NAVHEADER"
><TABLE
+SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
><TH
COLSPAN="3"
ALIGN="center"
->Privoxy User Manual</TH
+>Privoxy 3.0.4 User Manual</TH
></TR
><TR
><TD
VALIGN="bottom"
><A
HREF="seealso.html"
+ACCESSKEY="P"
>Prev</A
></TD
><TD
CLASS="SECT1"
><A
NAME="APPENDIX"
->9. Appendix</A
-></H1
+></A
+>14. Appendix</H1
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="REGEX"
->9.1. Regular Expressions</A
-></H2
+></A
+>14.1. Regular Expressions</H2
><P
> <SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
-> can use <SPAN
-CLASS="QUOTE"
->"regular expressions"</SPAN
->
- in various config files. Assuming support for <SPAN
+> uses Perl-style <SPAN
CLASS="QUOTE"
->"pcre"</SPAN
-> (Perl
- Compatible Regular Expressions) is compiled in, which is the default. Such
- configuration directives do not require regular expressions, but they can be
- used to increase flexibility by matching a pattern with wild-cards against
- URLs.</P
+>"regular
+ expressions"</SPAN
+> in its <A
+HREF="actions-file.html"
+>actions
+ files</A
+> and <A
+HREF="filter-file.html"
+>filter file</A
+>,
+ through the <A
+HREF="http://www.pcre.org/"
+TARGET="_top"
+>PCRE</A
+> and
+ <SPAN
+CLASS="APPLICATION"
+>PCRS</SPAN
+> libraries.</P
><P
> If you are reading this, you probably don't understand what <SPAN
CLASS="QUOTE"
>"regular
expressions"</SPAN
> are, or what they can do. So this will be a very brief
- introduction only. A full explanation would require a book ;-)</P
+ introduction only. A full explanation would require a <A
+HREF="http://www.oreilly.com/catalog/regex/"
+TARGET="_top"
+>book</A
+> ;-)</P
><P
-> <SPAN
+> Regular expressions provide a language to describe patterns that can be
+ run against strings of characters (letter, numbers, etc), to see if they
+ match the string or not. The patterns are themselves (sometimes complex)
+ strings of literal characters, combined with wild-cards, and other special
+ characters, called meta-characters. The <SPAN
CLASS="QUOTE"
->"Regular expressions"</SPAN
-> is a way of matching one character
- expression against another to see if it matches or not. One of the
+>"meta-characters"</SPAN
+> have
+ special meanings and are used to build complex patterns to be matched against.
+ Perl Compatible Regular Expressions are an especially convenient
<SPAN
CLASS="QUOTE"
->"expressions"</SPAN
-> is a literal string of readable characters
- (letter, numbers, etc), and the other is a complex string of literal
- characters combined with wild-cards, and other special characters, called
- meta-characters. The <SPAN
-CLASS="QUOTE"
->"meta-characters"</SPAN
-> have special meanings and
- are used to build the complex pattern to be matched against. Perl Compatible
- Regular Expressions is an enhanced form of the regular expression language
- with backward compatibility.</P
+>"dialect"</SPAN
+> of the regular expression language.</P
><P
> To make a simple analogy, we do something similar when we use wild-card
characters when listing files with the <B
building complex patterns however. Let's look at a few of the common ones,
and then some examples:</P
><P
+><P
></P
><TABLE
BORDER="0"
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>.</I
+></SPAN
> - Matches any single character, e.g. <SPAN
CLASS="QUOTE"
>"a"</SPAN
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>?</I
+></SPAN
> - The preceding character or expression is matched ZERO or ONE
times. Either/or.
</TD
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>+</I
+></SPAN
> - The preceding character or expression is matched ONE or MORE
times.
</TD
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>*</I
+></SPAN
> - The preceding character or expression is matched ZERO or MORE
times.
</TD
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>\</I
+></SPAN
> - The <SPAN
CLASS="QUOTE"
>"escape"</SPAN
CLASS="QUOTE"
>"."</SPAN
>) needs to be taken literally and
- not as a special meta-character.
+ not as a special meta-character. Example: <SPAN
+CLASS="QUOTE"
+>"example\.com"</SPAN
+>, makes
+ sure the period is recognized only as a period (and not expanded to its
+ meta-character meaning of any single character).
</TD
></TR
></TBODY
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>[]</I
+></SPAN
> - Characters enclosed in brackets will be matched if
- any of the enclosed characters are encountered.
+ any of the enclosed characters are encountered. For instance, <SPAN
+CLASS="QUOTE"
+>"[0-9]"</SPAN
+>
+ matches any numeric digit (zero through nine). As an example, we can combine
+ this with <SPAN
+CLASS="QUOTE"
+>"+"</SPAN
+> to match any digit one of more times: <SPAN
+CLASS="QUOTE"
+>"[0-9]+"</SPAN
+>.
</TD
></TR
></TBODY
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>()</I
+></SPAN
> - parentheses are used to group a sub-expression,
or multiple sub-expressions.
</TD
></TABLE
><P
></P
+></P
+><P
><P
></P
><TABLE
><TBODY
><TR
><TD
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>|</I
+></SPAN
> - The <SPAN
CLASS="QUOTE"
>"bar"</SPAN
sub-expression on either side of <SPAN
CLASS="QUOTE"
>"|"</SPAN
-> matches.
- </TD
-></TR
-></TBODY
-></TABLE
-><P
-></P
-><P
-></P
-><TABLE
-BORDER="0"
-><TBODY
-><TR
-><TD
-> <I
-CLASS="EMPHASIS"
->s/string1/string2/g</I
-> - This is used to rewrite strings of text.
+> matches. As an example:
<SPAN
CLASS="QUOTE"
->"string1"</SPAN
-> is replaced by <SPAN
+>"/(this|that) example/"</SPAN
+> uses grouping and the bar character
+ and would match either <SPAN
CLASS="QUOTE"
->"string2"</SPAN
-> in this
- example.
+>"this example"</SPAN
+> or <SPAN
+CLASS="QUOTE"
+>"that
+ example"</SPAN
+>, and nothing else.
</TD
></TR
></TBODY
></TABLE
><P
></P
+></P
><P
> These are just some of the ones you are likely to use when matching URLs with
<SPAN
list. This is enough to get us started with a few simple examples which may
be more illuminating:</P
><P
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
><TT
CLASS="LITERAL"
>/.*/banners/.*</TT
></I
+></SPAN
> - A simple example
that uses the common combination of <SPAN
CLASS="QUOTE"
><P
> A now something a little more complex:</P
><P
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
><TT
CLASS="LITERAL"
>/.*/adv((er)?ts?|ertis(ing|ements?))?/</TT
></I
+></SPAN
> -
We have several literal forward slashes again (<SPAN
CLASS="QUOTE"
CLASS="QUOTE"
>".*"</SPAN
>, so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that <I
+ it matches our expression. The only true literal that <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>must
match</I
+></SPAN
> our pattern is <SPAN
CLASS="APPLICATION"
>adv</SPAN
CLASS="QUOTE"
>"ing"</SPAN
>
- <I
+ <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>OR</I
+></SPAN
> <SPAN
CLASS="QUOTE"
>"ements?"</SPAN
>, which would then match
either spelling.</P
><P
-> <I
+> <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
><TT
CLASS="LITERAL"
>/.*/advert[0-9]+\.(gif|jpe?g)</TT
></I
+></SPAN
> - Again
another path statement with forward slashes. Anything in the square brackets
<SPAN
> is not
in the expression anywhere).</P
><P
-> <I
-CLASS="EMPHASIS"
-><TT
-CLASS="LITERAL"
->s/microsoft(?!.com)/MicroSuck/i</TT
-></I
-> - This is
- a substitution. <SPAN
-CLASS="QUOTE"
->"MicroSuck"</SPAN
-> will replace any occurrence of
- <SPAN
-CLASS="QUOTE"
->"microsoft"</SPAN
->. The <SPAN
-CLASS="QUOTE"
->"i"</SPAN
-> at the end of the expression
- means ignore case. The <SPAN
-CLASS="QUOTE"
->"(?!.com)"</SPAN
-> means
- the match should fail if <SPAN
-CLASS="QUOTE"
->"microsoft"</SPAN
-> is followed by
- <SPAN
-CLASS="QUOTE"
->".com"</SPAN
->. In other words, this acts like a <SPAN
-CLASS="QUOTE"
->"NOT"</SPAN
->
- modifier. In case this is a hyperlink, we don't want to break it ;-).</P
-><P
> We are barely scratching the surface of regular expressions here so that you
can understand the default <SPAN
CLASS="APPLICATION"
TARGET="_top"
>http://www.perldoc.com/perl5.6/pod/perlre.html</A
></P
+><P
+> For information on regular expression based substitutions and their applications
+ in filters, please see the <A
+HREF="filter-file.html"
+>filter file tutorial</A
+>
+ in this manual.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
-NAME="AEN1613"
->9.2. <SPAN
+NAME="AEN4670"
+></A
+>14.2. <SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
->'s Internal Pages</A
-></H2
+>'s Internal Pages</H2
><P
> Since <SPAN
CLASS="APPLICATION"
Privoxy main page:
</P
><A
-NAME="AEN1628"
+NAME="AEN4685"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
</P
></BLOCKQUOTE
><P
-> Alternately, this may be reached at <A
+> There is a shortcut: <A
HREF="http://p.p/"
TARGET="_top"
>http://p.p/</A
->, but this
- variation may not work as reliably as the above in some configurations.
+> (But it
+ doesn't provide a fall-back to a real page, in case the request is not
+ sent through <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+>)
</P
></LI
><LI
><P
>
- Show information about the current configuration:
+ Show information about the current configuration, including viewing and
+ editing of actions files:
</P
><A
-NAME="AEN1635"
+NAME="AEN4693"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
Show the source code version numbers:
</P
><A
-NAME="AEN1640"
+NAME="AEN4698"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
><LI
><P
>
- Show the client's request headers:
+ Show the browser's request headers:
</P
><A
-NAME="AEN1645"
+NAME="AEN4703"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
Show which actions apply to a URL and why:
</P
><A
-NAME="AEN1650"
+NAME="AEN4708"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
to run, but only as a pass-through proxy, with no actions taking place:
</P
><A
-NAME="AEN1656"
+NAME="AEN4714"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
> Short cuts. Turn off, then on:
</P
><A
-NAME="AEN1660"
+NAME="AEN4718"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
</P
></BLOCKQUOTE
><A
-NAME="AEN1663"
+NAME="AEN4721"
></A
><BLOCKQUOTE
CLASS="BLOCKQUOTE"
</P
></BLOCKQUOTE
></LI
-><LI
-><P
->
- Edit the actions list file:
- </P
-><A
-NAME="AEN1668"
-></A
-><BLOCKQUOTE
-CLASS="BLOCKQUOTE"
-><P
->
- <A
-HREF="http://config.privoxy.org/edit-actions"
-TARGET="_top"
->http://config.privoxy.org/edit-actions</A
->
- </P
-></BLOCKQUOTE
-></LI
></UL
></P
><P
-> These may be bookmarked for quick reference. </P
+> These may be bookmarked for quick reference. See next. </P
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="BOOKMARKLETS"
->9.2.1. Bookmarklets</A
-></H3
+></A
+>14.2.1. Bookmarklets</H3
><P
> Below are some <SPAN
CLASS="QUOTE"
CLASS="QUOTE"
>"may not be safe"</SPAN
> - just click OK. Then you can run the
- Bookmarklet directly from your favourites/bookmarks. For even faster access,
+ Bookmarklet directly from your favorites/bookmarks. For even faster access,
you can put them on the <SPAN
CLASS="QUOTE"
>"Links"</SPAN
> <A
HREF="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=enabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
TARGET="_top"
->Enable Privoxy</A
+>Privoxy - Enable</A
>
</P
></LI
> <A
HREF="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=disabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
TARGET="_top"
->Disable Privoxy</A
+>Privoxy - Disable</A
>
</P
></LI
> <A
HREF="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=toggle','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
TARGET="_top"
->Toggle Privoxy</A
+>Privoxy - Toggle Privoxy</A
> (Toggles between enabled and disabled)
</P
></LI
> <A
HREF="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y','ijbstatus','width=250,height=2,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
TARGET="_top"
->View Privoxy Status</A
+>Privoxy- View Status</A
>
</P
></LI
><LI
><P
> <A
-HREF="javascript:w=Math.floor(screen.width/2);h=Math.floor(screen.height*0.9);void(window.open('http://www.privoxy.org/actions','Feedback','screenx='+w+',width='+w+',height='+h+',scrollbars=yes,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());"
+HREF="javascript:void(window.open('http://config.privoxy.org/show-url-info?url='+escape(location.href),'Why').focus());"
TARGET="_top"
->Actions file feedback system</A
+>Privoxy - Why?</A
>
</P
></LI
></UL
></P
><P
-> Credit: The site which gave me the general idea for these bookmarklets is
+> Credit: The site which gave us the general idea for these bookmarklets is
<A
-HREF="http://www.bookmarklets.com"
+HREF="http://www.bookmarklets.com/"
TARGET="_top"
>www.bookmarklets.com</A
>. They
><H2
CLASS="SECT2"
><A
-NAME="ACTIONSANAT"
->9.3. Anatomy of an Action</A
-></H2
+NAME="CHAIN"
+></A
+>14.3. Chain of Events</H2
><P
-> The way <SPAN
+> Let's take a quick look at the basic sequence of events when a web page is
+ requested by your browser and <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> is on duty:</P
+><P
+> <P
+></P
+><UL
+><LI
+><P
+> First, your web browser requests a web page. The browser knows to send
+ the request to <SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
-> applies <SPAN
+>, which will in turn,
+ relay the request to the remote web server after passing the following
+ tests:
+ </P
+></LI
+><LI
+><P
+> <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> traps any request for its own internal CGI
+ pages (e.g http://p.p/) and sends the CGI page back to the browser.
+ </P
+></LI
+><LI
+><P
+> Next, <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> checks to see if the URL
+ matches any <A
+HREF="actions-file.html#BLOCK"
+><SPAN
CLASS="QUOTE"
->"actions"</SPAN
+>"+block"</SPAN
+></A
+> patterns. If
+ so, the URL is then blocked, and the remote web server will not be contacted.
+ <A
+HREF="actions-file.html#HANDLE-AS-IMAGE"
+><SPAN
+CLASS="QUOTE"
+>"+handle-as-image"</SPAN
+></A
+>
+ is then checked and if it does not match, an
+ HTML <SPAN
+CLASS="QUOTE"
+>"BLOCKED"</SPAN
+> page is sent back. Otherwise, if it does match,
+ an image is returned. The type of image depends on the setting of <A
+HREF="actions-file.html#SET-IMAGE-BLOCKER"
+><SPAN
+CLASS="QUOTE"
+>"+set-image-blocker"</SPAN
+></A
+>
+ (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
+ </P
+></LI
+><LI
+><P
+> Untrusted URLs are blocked. If URLs are being added to the
+ <TT
+CLASS="FILENAME"
+>trust</TT
+> file, then that is done.
+ </P
+></LI
+><LI
+><P
+> If the URL pattern matches the <A
+HREF="actions-file.html#FAST-REDIRECTS"
+><SPAN
+CLASS="QUOTE"
+>"+fast-redirects"</SPAN
+></A
+> action,
+ it is then processed. Unwanted parts of the requested URL are stripped.
+ </P
+></LI
+><LI
+><P
+> Now the rest of the client browser's request headers are processed. If any
+ of these match any of the relevant actions (e.g. <A
+HREF="actions-file.html#HIDE-USER-AGENT"
+><SPAN
+CLASS="QUOTE"
+>"+hide-user-agent"</SPAN
+></A
+>,
+ etc.), headers are suppressed or forged as determined by these actions and
+ their parameters.
+ </P
+></LI
+><LI
+><P
+> Now the web server starts sending its response back (i.e. typically a web page and related
+ data).
+ </P
+></LI
+><LI
+><P
+> First, the server headers are read and processed to determine, among other
+ things, the MIME type (document type) and encoding. The headers are then
+ filtered as determined by the
+ <A
+HREF="actions-file.html#CRUNCH-INCOMING-COOKIES"
+><SPAN
+CLASS="QUOTE"
+>"+crunch-incoming-cookies"</SPAN
+></A
+>,
+ <A
+HREF="actions-file.html#SESSION-COOKIES-ONLY"
+><SPAN
+CLASS="QUOTE"
+>"+session-cookies-only"</SPAN
+></A
+>,
+ and <A
+HREF="actions-file.html#DOWNGRADE-HTTP-VERSION"
+><SPAN
+CLASS="QUOTE"
+>"+downgrade-http-version"</SPAN
+></A
>
- and <SPAN
+ actions.
+ </P
+></LI
+><LI
+><P
+> If the <A
+HREF="actions-file.html#KILL-POPUPS"
+><SPAN
CLASS="QUOTE"
->"filters"</SPAN
-> to any given URL can be complex, and not always so
+>"+kill-popups"</SPAN
+></A
+>
+ action applies, and it is an HTML or JavaScript document, the popup-code in the
+ response is filtered on-the-fly as it is received.
+ </P
+></LI
+><LI
+><P
+> If a <A
+HREF="actions-file.html#FILTER"
+><SPAN
+CLASS="QUOTE"
+>"+filter"</SPAN
+></A
+>
+ or <A
+HREF="actions-file.html#DEANIMATE-GIFS"
+><SPAN
+CLASS="QUOTE"
+>"+deanimate-gifs"</SPAN
+></A
+>
+ action applies (and the document type fits the action), the rest of the page is
+ read into memory (up to a configurable limit). Then the filter rules (from
+ <TT
+CLASS="FILENAME"
+>default.filter</TT
+> and any other filter files) are
+ processed against the buffered content. Filters are applied in the order
+ they are specified in one of the filter files. Animated GIFs, if present,
+ are reduced to either the first or last frame, depending on the action
+ setting.The entire page, which is now filtered, is then sent by
+ <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> back to your browser.
+ </P
+><P
+> If neither <A
+HREF="actions-file.html#FILTER"
+><SPAN
+CLASS="QUOTE"
+>"+filter"</SPAN
+></A
+>
+ or <A
+HREF="actions-file.html#DEANIMATE-GIFS"
+><SPAN
+CLASS="QUOTE"
+>"+deanimate-gifs"</SPAN
+></A
+>
+ matches, then <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> passes the raw data through
+ to the client browser as it becomes available.
+ </P
+></LI
+><LI
+><P
+> As the browser receives the now (probably filtered) page content, it
+ reads and then requests any URLs that may be embedded within the page
+ source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
+ frames), sounds, etc. For each of these objects, the browser issues a new
+ request. And each such request is in turn processed as above. Note that a
+ complex web page may have many such embedded URLs.
+ </P
+></LI
+></UL
+></P
+></DIV
+><DIV
+CLASS="SECT2"
+><H2
+CLASS="SECT2"
+><A
+NAME="ACTIONSANAT"
+></A
+>14.4. Anatomy of an Action</H2
+><P
+> The way <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> applies
+ <A
+HREF="actions-file.html#ACTIONS"
+>actions</A
+> and <A
+HREF="actions-file.html#FILTER"
+>filters</A
+>
+ to any given URL can be complex, and not always so
easy to understand what is happening. And sometimes we need to be able to
- <I
+ <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>see</I
+></SPAN
> just what <SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
> is doing
- is causing us a problem inadvertantly. It can be a little daunting to look at
+ is causing us a problem inadvertently. It can be a little daunting to look at
the actions and filters files themselves, since they tend to be filled with
- <SPAN
-CLASS="QUOTE"
->"regular expressions"</SPAN
-> whose consequences are not always
- so obvious. <SPAN
+ <A
+HREF="appendix.html#REGEX"
+>regular expressions</A
+> whose consequences are not
+ always so obvious. </P
+><P
+> One quick test to see if <SPAN
+CLASS="APPLICATION"
+>Privoxy</SPAN
+> is causing a problem
+ or not, is to disable it temporarily. This should be the first troubleshooting
+ step. See <A
+HREF="appendix.html#BOOKMARKLETS"
+>the Bookmarklets</A
+> section on a quick
+ and easy way to do this (be sure to flush caches afterward!). Looking at the
+ logs is a good idea too.</P
+><P
+> <SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
-> provides the
+> also provides the
<A
HREF="http://config.privoxy.org/show-url-info"
TARGET="_top"
CLASS="APPLICATION"
>actions</SPAN
>
- are being applied to any given URL. This is a big help for troubleshooting.
- </P
+ are being applied to any given URL. This is a big help for troubleshooting.</P
><P
> First, enter one URL (or partial URL) at the prompt, and then
<SPAN
>Privoxy</SPAN
> will tell us
how the current configuration will handle it. This will not
- help with filtering effects from the <TT
-CLASS="FILENAME"
->default.filter</TT
-> file! It
- also will not tell you about any other URLs that may be embedded within the
- URL you are testing. For instance, images such as ads are expressed as URLs
- within the raw page source of HTML pages. So you will only get info for the
- actual URL that is pasted into the prompt area -- not any sub-URLs. If you
- want to know about embedded URLs like ads, you will have to dig those out of
- the HTML source. Use your browser's <SPAN
-CLASS="QUOTE"
->"View Page Source"</SPAN
-> option
- for this. Or right click on the ad, and grab the URL.</P
-><P
-> Let's look at an example, <A
+ help with filtering effects (i.e. the <A
+HREF="actions-file.html#FILTER"
+><SPAN
+CLASS="QUOTE"
+>"+filter"</SPAN
+></A
+> action) from
+ one of the filter files since this is handled very
+ differently and not so easy to trap! It also will not tell you about any other
+ URLs that may be embedded within the URL you are testing. For instance, images
+ such as ads are expressed as URLs within the raw page source of HTML pages. So
+ you will only get info for the actual URL that is pasted into the prompt area
+ -- not any sub-URLs. If you want to know about embedded URLs like ads, you
+ will have to dig those out of the HTML source. Use your browser's <SPAN
+CLASS="QUOTE"
+>"View
+ Page Source"</SPAN
+> option for this. Or right click on the ad, and grab the
+ URL.</P
+><P
+> Let's try an example, <A
HREF="http://google.com"
TARGET="_top"
>google.com</A
>,
- one section at a time:</P
-><P
-> <TABLE
-BORDER="0"
-BGCOLOR="#E0E0E0"
-WIDTH="100%"
-><TR
-><TD
-><PRE
-CLASS="SCREEN"
-> System default actions:
-
- { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter
- -hide-forwarded -hide-from -hide-referer -hide-user-agent -image
- -image-blocker -limit-connect -no-compression -no-cookies-keep
- -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer }
-
- </PRE
-></TD
-></TR
-></TABLE
-></P
-><P
-> This is the top section, and only tells us of the compiled in defaults. This
- is basically what <SPAN
-CLASS="APPLICATION"
->Privoxy</SPAN
-> would do if there
- were not any <SPAN
-CLASS="QUOTE"
->"actions"</SPAN
-> defined, i.e. it does nothing. Every action
- is disabled. This is not particularly informative for our purposes here. OK,
- next section:</P
+ and look at it one section at a time in a sample configuration (your real
+ configuration may vary):</P
><P
> <TABLE
BORDER="0"
><TD
><PRE
CLASS="SCREEN"
-> Matches for http://google.com:
+> Matches for http://google.com:
- { -add-header -block +deanimate-gifs -downgrade +fast-redirects
- +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
- +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
- +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
- -hide-user-agent -image +image-blocker{blank} +no-compression
- +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups
- -vanilla-wafer -wafer }
- /
+ In file: default.action <SPAN
+CLASS="GUIBUTTON"
+>[ View ]</SPAN
+> <SPAN
+CLASS="GUIBUTTON"
+>[ Edit ]</SPAN
+>
- { -no-cookies-keep -no-cookies-read -no-cookies-set }
- .google.com
+ {-add-header
+ -block
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs {last}
+ -downgrade-http-version
+ +fast-redirects {check-decoded-url}
+ -filter {js-events}
+ -filter {content-cookies}
+ -filter {all-popups}
+ -filter {banners-by-link}
+ -filter {tiny-textforms}
+ -filter {frameset-borders}
+ -filter {demoronizer}
+ -filter {shockwave-flash}
+ -filter {quicktime-kioskmode}
+ -filter {fun}
+ -filter {crude-parental}
+ -filter {site-specifics}
+ +filter {js-annoyances}
+ +filter {html-annoyances}
+ +filter {refresh-tags}
+ +filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter-client-headers
+ -filter-server-headers
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-forwarded-for-headers
+ +hide-from-header {block}
+ -hide-if-modified-since
+ +hide-referrer {forge}
+ -hide-user-agent
+ -inspect-jpegs
+ -kill-popups
+ -limit-connect
+ -overwrite-last-modified
+ +prevent-compression
+ -redirect
+ -send-vanilla-wafer
+ -send-wafer
+ +session-cookies-only
+ +set-image-blocker {pattern}
+ -treat-forbidden-connects-like-blocks }
+/
+
+ { -session-cookies-only }
+ .google.com
{ -fast-redirects }
- .google.com
+ .google.com
- </PRE
+In file: user.action <SPAN
+CLASS="GUIBUTTON"
+>[ View ]</SPAN
+> <SPAN
+CLASS="GUIBUTTON"
+>[ Edit ]</SPAN
+>
+(no matches in this file) </PRE
></TD
></TR
></TABLE
></P
><P
-> This is much more informative, and tells us how we have defined our
- <SPAN
+> This is telling us how we have defined our
+ <A
+HREF="actions-file.html#ACTIONS"
+><SPAN
CLASS="QUOTE"
>"actions"</SPAN
->, and which ones match for our example,
- <SPAN
+></A
+>, and
+ which ones match for our test case, <SPAN
CLASS="QUOTE"
>"google.com"</SPAN
->. The first grouping shows our default
- settings, which would apply to all URLs. If you look at your <SPAN
+>.
+ Displayed is all the actions that are available to us. Remember,
+ the <TT
+CLASS="LITERAL"
+>+</TT
+> sign denotes <SPAN
CLASS="QUOTE"
->"actions"</SPAN
+>"on"</SPAN
+>. <TT
+CLASS="LITERAL"
+>-</TT
>
- file, this would be the section just below the <SPAN
+ denotes <SPAN
+CLASS="QUOTE"
+>"off"</SPAN
+>. So some are <SPAN
+CLASS="QUOTE"
+>"on"</SPAN
+> here, but many
+ are <SPAN
+CLASS="QUOTE"
+>"off"</SPAN
+>. Each example we try may provide a slightly different
+ end result, depending on our configuration directives.</P
+><P
+> The first listing
+ is any matches for the <TT
+CLASS="FILENAME"
+>standard.action</TT
+> file. No hits at
+ all here on <SPAN
+CLASS="QUOTE"
+>"standard"</SPAN
+>. Then next is <SPAN
+CLASS="QUOTE"
+>"default"</SPAN
+>, or
+ our <TT
+CLASS="FILENAME"
+>default.action</TT
+> file. The large, multi-line listing,
+ is how the actions are set to match for all URLs, i.e. our default settings.
+ If you look at your <SPAN
+CLASS="QUOTE"
+>"actions"</SPAN
+> file, this would be the section
+ just below the <SPAN
CLASS="QUOTE"
>"aliases"</SPAN
-> section
- near the top. This applies to all URLs as signified by the single forward
- slash -- <SPAN
+> section near the top. This will apply to
+ all URLs as signified by the single forward slash at the end of the listing
+ -- <SPAN
CLASS="QUOTE"
>"/"</SPAN
->.
- </P
+>.</P
><P
-> These are the default actions we have enabled. But we can define additional
- actions that would be exceptions to these general rules, and then list
- specific URLs that these exceptions would apply to. Last match wins.
- Just below this then are two explict matches for <SPAN
+> But we can define additional actions that would be exceptions to these general
+ rules, and then list specific URLs (or patterns) that these exceptions would
+ apply to. Last match wins. Just below this then are two explicit matches for
+ <SPAN
CLASS="QUOTE"
>".google.com"</SPAN
->.
- The first is negating our various cookie blocking actions (i.e. we will allow
- cookies here). The second is allowing <SPAN
+>. The first is negating our previous cookie setting,
+ which was for <A
+HREF="actions-file.html#SESSION-COOKIES-ONLY"
+><SPAN
CLASS="QUOTE"
->"fast-redirects"</SPAN
->. Note
- that there is a leading dot here -- <SPAN
+>"+session-cookies-only"</SPAN
+></A
+>
+ (i.e. not persistent). So we will allow persistent cookies for google, at
+ least that is how it is in this example. The second turns
+ <SPAN
+CLASS="emphasis"
+><I
+CLASS="EMPHASIS"
+>off</I
+></SPAN
+> any
+ <A
+HREF="actions-file.html#FAST-REDIRECTS"
+><SPAN
+CLASS="QUOTE"
+>"+fast-redirects"</SPAN
+></A
+>
+ action, allowing this to take place unmolested. Note that there is a leading
+ dot here -- <SPAN
CLASS="QUOTE"
>".google.com"</SPAN
->. This will
- match any hosts and sub-domains, in the google.com domain also, such as
+>. This will match any hosts and
+ sub-domains, in the google.com domain also, such as
<SPAN
CLASS="QUOTE"
>"www.google.com"</SPAN
->. So, apparently, we have these actions defined
- somewhere in the lower part of our actions file, and
- <SPAN
+>. So, apparently, we have these two actions
+ defined somewhere in the lower part of our <TT
+CLASS="FILENAME"
+>default.action</TT
+>
+ file, and <SPAN
CLASS="QUOTE"
>"google.com"</SPAN
-> is referenced in these sections. </P
+> is referenced somewhere in these latter
+ sections.</P
><P
-> And now we pull it altogether in the bottom section and summarize how
+> Then, for our <TT
+CLASS="FILENAME"
+>user.action</TT
+> file, we again have no hits.
+ So there is nothing google-specific that we might have added to our own, local
+ configuration.</P
+><P
+> And finally we pull it all together in the bottom section and summarize how
<SPAN
CLASS="APPLICATION"
>Privoxy</SPAN
-> is appying all its <SPAN
+> is applying all its <SPAN
CLASS="QUOTE"
>"actions"</SPAN
>
><PRE
CLASS="SCREEN"
> Final results:
-
- -add-header -block -deanimate-gifs -downgrade -fast-redirects
- +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
- +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
- +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
- -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression
- -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer
- -wafer
-
- </PRE
+
+ -add-header
+ -block
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs {last}
+ -downgrade-http-version
+ -fast-redirects
+ +filter {js-annoyances}
+ +filter {html-annoyances}
+ +filter {refresh-tags}
+ +filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter-client-headers
+ -filter-server-headers
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-forwarded-for-headers
+ +hide-from-header {block}
+ -hide-if-modified-since
+ +hide-referrer {forge}
+ -hide-user-agent
+ -inspect-jpegs
+ -kill-popups
+ -limit-connect
+ -overwrite-last-modified
+ +prevent-compression
+ -redirect
+ -send-vanilla-wafer
+ -send-wafer
+ -session-cookies-only
+ +set-image-blocker {pattern}
+ -treat-forbidden-connects-like-blocks </PRE
></TD
></TR
></TABLE
></P
><P
+> Notice the only difference here to the previous listing, is to
+ <SPAN
+CLASS="QUOTE"
+>"fast-redirects"</SPAN
+> and <SPAN
+CLASS="QUOTE"
+>"session-cookies-only"</SPAN
+>,
+ which are actived specifically for this site in our configuration,
+ and thus show in the <SPAN
+CLASS="QUOTE"
+>"Final Results"</SPAN
+>.</P
+><P
> Now another example, <SPAN
CLASS="QUOTE"
>"ad.doubleclick.net"</SPAN
><TD
><PRE
CLASS="SCREEN"
-> { +block +image }
+> { +block +handle-as-image }
.ad.doubleclick.net
- { +block +image }
+ { +block +handle-as-image }
ad*.
- { +block +image }
- .doubleclick.net
-
- </PRE
+ { +block +handle-as-image }
+ .doubleclick.net</PRE
></TD
></TR
></TABLE
> We'll just show the interesting part here, the explicit matches. It is
matched three different times. Each as an <SPAN
CLASS="QUOTE"
->"+block +image"</SPAN
+>"+block +handle-as-image"</SPAN
>,
which is the expanded form of one of our aliases that had been defined as:
<SPAN
CLASS="QUOTE"
>"+imageblock"</SPAN
->. (<SPAN
+>. (<A
+HREF="actions-file.html#ALIASES"
+><SPAN
CLASS="QUOTE"
>"Aliases"</SPAN
-> are defined in the
- first section of the actions file and typically used to combine more
+></A
+> are defined in
+ the first section of the actions file and typically used to combine more
than one action.)</P
><P
> Any one of these would have done the trick and blocked this as an unwanted
CLASS="QUOTE"
>"ad.doubleclick.net"</SPAN
>
- is done here -- as both a <SPAN
+ is done here -- as both a <A
+HREF="actions-file.html#BLOCK"
+><SPAN
CLASS="QUOTE"
>"+block"</SPAN
-> <I
+></A
+>
+ <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
>and</I
-> an
- <SPAN
+></SPAN
+> an
+ <A
+HREF="actions-file.html#HANDLE-AS-IMAGE"
+><SPAN
CLASS="QUOTE"
->"+image"</SPAN
->. The custom alias <SPAN
+>"+handle-as-image"</SPAN
+></A
+>.
+ The custom alias <SPAN
CLASS="QUOTE"
>"+imageblock"</SPAN
-> does this
- for us.</P
+> just simplifies the process and make
+ it more readable.</P
><P
> One last example. Let's try <SPAN
CLASS="QUOTE"
>"http://www.rhapsodyk.net/adsl/HOWTO/"</SPAN
>.
- This one is giving us problems. We are getting a blank page. Hmmm...</P
+ This one is giving us problems. We are getting a blank page. Hmmm ...</P
><P
> <TABLE
BORDER="0"
CLASS="SCREEN"
> Matches for http://www.rhapsodyk.net/adsl/HOWTO/:
- { -add-header -block +deanimate-gifs -downgrade +fast-redirects
- +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}
- +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal}
- +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge}
- -hide-user-agent -image +image-blocker{blank} +no-compression
- +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups
- -vanilla-wafer -wafer }
- /
+ In file: default.action <SPAN
+CLASS="GUIBUTTON"
+>[ View ]</SPAN
+> <SPAN
+CLASS="GUIBUTTON"
+>[ Edit ]</SPAN
+>
- { +block +image }
- /ads
+ {-add-header
+ -block
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs
+ -downgrade-http-version
+ +fast-redirects{check-decoded-url}
+ +filter{html-annoyances}
+ +filter{js-annoyances}
+ +filter{kill-popups}
+ +filter{webbugs}
+ +filter{nimda}
+ +filter{banners-by-size}
+ +filter{hal}
+ +filter{fun}
+ -filter-client-headers
+ -filter-server-headers
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-forwarded-for-headers
+ +hide-from-header{block}
+ +hide-referer{forge}
+ -hide-user-agent
+ -inspect-jpegs
+ +kill-popups
+ -overwrite-last-modified
+ +prevent-compression
+ -redirect
+ -send-vanilla-wafer
+ -send-wafer
+ +session-cookies-only
+ +set-image-blocker{blank}
+ -treat-forbidden-connects-like-blocks }
+ /
- </PRE
+ { +block +handle-as-image }
+ /ads</PRE
></TD
></TR
></TABLE
> is matching <SPAN
CLASS="QUOTE"
>"/ads"</SPAN
->! But
- we did not want this at all! Now we see why we get the blank page. We could
- now add a new action below this that explictly does <I
+> in our
+ configuration! But we did not want this at all! Now we see why we get the
+ blank page. We could now add a new action below this that explicitly
+ <SPAN
+CLASS="emphasis"
+><I
CLASS="EMPHASIS"
->not</I
->
- block (-block) pages with <SPAN
+>un</I
+></SPAN
+> blocks (<SPAN
+CLASS="QUOTE"
+>"{-block}"</SPAN
+>) paths with
+ <SPAN
CLASS="QUOTE"
>"adsl"</SPAN
->. There are various ways to
- handle such exceptions. Example:</P
+> in them (remember, last match in the configuration wins).
+ There are various ways to handle such exceptions. Example:</P
><P
> <TABLE
BORDER="0"
><PRE
CLASS="SCREEN"
> { -block }
- /adsl
-
- </PRE
+ /adsl</PRE
></TD
></TR
></TABLE
><TD
><PRE
CLASS="SCREEN"
-> { -block }
- /adsl
-
- </PRE
+> { +block +handle-as-image }
+ /ads</PRE
></TD
></TR
></TABLE
One likely cause would be one of the <SPAN
CLASS="QUOTE"
>"{+filter}"</SPAN
-> actions. Try
- adding the URL for the site to one of aliases that turn off <SPAN
+> actions. These
+ tend to be harder to troubleshoot. Try adding the URL for the site to one of
+ aliases that turn off <SPAN
CLASS="QUOTE"
>"+filter"</SPAN
>:</P
.worldpay.com # for quietpc.com
.jungle.com
.scan.co.uk
- .forbes.com
-
- </PRE
+ .forbes.com</PRE
></TD
></TR
></TABLE
> that expands to
<SPAN
CLASS="QUOTE"
->"{ -filter -no-cookies -no-cookies-keep }"</SPAN
->. Or you could do
- your own exception to negate filtering: </P
+>"{ -filter -session-cookies-only }"</SPAN
+>.
+ Or you could do your own exception to negate filtering: </P
><P
> <TABLE
BORDER="0"
><PRE
CLASS="SCREEN"
> {-filter}
- .forbes.com
-
- </PRE
+ .forbes.com</PRE
></TD
></TR
></TABLE
></P
><P
+> This would turn off all filtering for that site. This would probably be most
+ appropriately put in <TT
+CLASS="FILENAME"
+>user.action</TT
+>, for local site
+ exceptions.</P
+><P
+> Images that are inexplicably being blocked, may well be hitting the
+ <SPAN
+CLASS="QUOTE"
+>"+filter{banners-by-size}"</SPAN
+> rule, which assumes
+ that images of certain sizes are ad banners (works well most of the time
+ since these tend to be standardized).</P
+><P
> <SPAN
CLASS="QUOTE"
>"{fragile}"</SPAN
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
+SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
VALIGN="top"
><A
HREF="seealso.html"
+ACCESSKEY="P"
>Prev</A
></TD
><TD
VALIGN="top"
><A
HREF="index.html"
+ACCESSKEY="H"
>Home</A
></TD
><TD