-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect1 id="templates">
-<title>Privoxy's Template Files</title>
-<para>
- All <application>Privoxy</application> built-in pages, i.e. error pages such as the
- <ulink url="http://show-the-404-error.page"><quote>404 - No Such Domain</quote>
- error page</ulink>, the <ulink
- url="http://ads.bannerserver.example.com/nasty-ads/sponsor.html"><quote>BLOCKED</quote>
- page</ulink>
- and all pages of its <ulink url="http://config.privoxy.org/">web-based
- user interface</ulink>, are generated from <emphasis>templates</emphasis>.
- (<application>Privoxy</application> must be running for the above links to work as
- intended.)
-</para>
-
-<para>
- These templates are stored in a subdirectory of the <link linkend="confdir">configuration
- directory</link> called <filename>templates</filename>. On Unixish platforms,
- this is typically
- <ulink url="file:///etc/privoxy/templates/"><filename>/etc/privoxy/templates/</filename></ulink>.
-</para>
-
-<para>
- The templates are basically normal HTML files, but with place-holders (called symbols
- or exports), which <application>Privoxy</application> fills at run time. It
- is possible to edit the templates with a normal text editor, should you want
- to customize them. (<emphasis>Not recommended for the casual
- user</emphasis>). Should you create your own custom templates, you should use
- the <filename>config</filename> setting <link linkend="templdir">templdir</link>
- to specify an alternate location, so your templates do not get overwritten
- during upgrades.
- </para>
- <para>
- Note that just like in configuration files, lines starting
- with <literal>#</literal> are ignored when the templates are filled in.
-</para>
-
-<para>
- The place-holders are of the form <literal>@name@</literal>, and you will
- find a list of available symbols, which vary from template to template,
- in the comments at the start of each file. Note that these comments are not
- always accurate, and that it's probably best to look at the existing HTML
- code to find out which symbols are supported and what they are filled in with.
-</para>
-
-<para>
- A special application of this substitution mechanism is to make whole
- blocks of HTML code disappear when a specific symbol is set. We use this
- for many purposes, one of them being to include the beta warning in all
- our user interface (CGI) pages when <application>Privoxy</application>
- is in an alpha or beta development stage:
-</para>
-
-<para>
- <screen>
-<!-- @if-unstable-start -->
-
- ... beta warning HTML code goes here ...
-
-<!-- if-unstable-end@ --></screen>
-</para>
-
-<para>
- If the "unstable" symbol is set, everything in between and including
- <literal>@if-unstable-start</literal> and <literal>if-unstable-end@</literal>
- will disappear, leaving nothing but an empty comment:
-</para>
-
-<para>
- <screen><!-- --></screen>
-</para>
-
-<para>
- There's also an if-then-else construct and an <literal>#include</literal>
- mechanism, but you'll sure find out if you are inclined to edit the
- templates ;-)
-</para>
-
-<para>
- All templates refer to a style located at
- <ulink url="http://config.privoxy.org/send-stylesheet"><literal>http://config.privoxy.org/send-stylesheet</literal></ulink>.
- This is, of course, locally served by <application>Privoxy</application>
- and the source for it can be found and edited in the
- <filename>cgi-style.css</filename> template.
-</para>
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect1 id="contact"><title>Contacting the Developers, Bug Reporting and Feature
-Requests</title>
-
-<!-- Include contacting.sgml boilerplate: -->
- &contacting;
-<!-- end boilerplate -->
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="copyright"><title>Privoxy Copyright, License and History</title>
-
-<!-- Include copyright.sgml: -->
- ©right;
-<!-- end copyright -->
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2><title>License</title>
-<!-- Include copyright.sgml: -->
- &license;
-<!-- end copyright -->
-</sect2>
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect2 id="history"><title>History</title>
-<!-- Include history.sgml: -->
- &history;
-<!-- end history -->
-</sect2>
-
-<sect2 id="authors"><title>Authors</title>
-<!-- Include p-authors.sgml: -->
- &p-authors;
-<!-- end authors -->
-</sect2>
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="seealso"><title>See Also</title>
-<!-- Include seealso.sgml: -->
- &seealso;
-<!-- end seealso -->
-</sect1>
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="appendix"><title>Appendix</title>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="regex">
-<title>Regular Expressions</title>
-<para>
- <application>Privoxy</application> uses Perl-style <quote>regular
- expressions</quote> in its <link linkend="actions-file">actions
- files</link> and <link linkend="filter-file">filter file</link>,
- through the <ulink url="http://www.pcre.org/">PCRE</ulink> and
-<!--
- dead 08/27/06
- <ulink url="http://www.oesterhelt.org/pcrs/">PCRS</ulink> libraries.
--->
- <application>PCRS</application> libraries.
-</para>
-
-<para>
- If you are reading this, you probably don't understand what <quote>regular
- expressions</quote> are, or what they can do. So this will be a very brief
- introduction only. A full explanation would require a <ulink
- url="http://www.oreilly.com/catalog/regex/">book</ulink> ;-)
-</para>
-
-<para>
- Regular expressions provide a language to describe patterns that can be
- run against strings of characters (letter, numbers, etc), to see if they
- match the string or not. The patterns are themselves (sometimes complex)
- strings of literal characters, combined with wild-cards, and other special
- characters, called meta-characters. The <quote>meta-characters</quote> have
- special meanings and are used to build complex patterns to be matched against.
- Perl Compatible Regular Expressions are an especially convenient
- <quote>dialect</quote> of the regular expression language.
-</para>
-
-<para>
- To make a simple analogy, we do something similar when we use wild-card
- characters when listing files with the <command>dir</command> command in DOS.
- <literal>*.*</literal> matches all filenames. The <quote>special</quote>
- character here is the asterisk which matches any and all characters. We can be
- more specific and use <literal>?</literal> to match just individual
- characters. So <quote>dir file?.text</quote> would match
- <quote>file1.txt</quote>, <quote>file2.txt</quote>, etc. We are pattern
- matching, using a similar technique to <quote>regular expressions</quote>!
-</para>
-
-<para>
- Regular expressions do essentially the same thing, but are much, much more
- powerful. There are many more <quote>special characters</quote> and ways of
- building complex patterns however. Let's look at a few of the common ones,
- and then some examples:
-</para>
-
-<para><simplelist>
- <member>
- <emphasis>.</emphasis> - Matches any single character, e.g. <quote>a</quote>,
- <quote>A</quote>, <quote>4</quote>, <quote>:</quote>, or <quote>@</quote>.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>?</emphasis> - The preceding character or expression is matched ZERO or ONE
- times. Either/or.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>+</emphasis> - The preceding character or expression is matched ONE or MORE
- times.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>*</emphasis> - The preceding character or expression is matched ZERO or MORE
- times.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>\</emphasis> - The <quote>escape</quote> character denotes that
- the following character should be taken literally. This is used where one of the
- special characters (e.g. <quote>.</quote>) needs to be taken literally and
- not as a special meta-character. Example: <quote>example\.com</quote>, makes
- sure the period is recognized only as a period (and not expanded to its
- meta-character meaning of any single character).
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>[ ]</emphasis> - Characters enclosed in brackets will be matched if
- any of the enclosed characters are encountered. For instance, <quote>[0-9]</quote>
- matches any numeric digit (zero through nine). As an example, we can combine
- this with <quote>+</quote> to match any digit one of more times: <quote>[0-9]+</quote>.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>( )</emphasis> - parentheses are used to group a sub-expression,
- or multiple sub-expressions.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>|</emphasis> - The <quote>bar</quote> character works like an
- <quote>or</quote> conditional statement. A match is successful if the
- sub-expression on either side of <quote>|</quote> matches. As an example:
- <quote>/(this|that) example/</quote> uses grouping and the bar character
- and would match either <quote>this example</quote> or <quote>that
- example</quote>, and nothing else.
- </member>
-</simplelist></para>
-
-<para>
- These are just some of the ones you are likely to use when matching URLs with
- <application>Privoxy</application>, and is a long way from a definitive
- list. This is enough to get us started with a few simple examples which may
- be more illuminating:
-</para>
-
-<para>
- <emphasis><literal>/.*/banners/.*</literal></emphasis> - A simple example
- that uses the common combination of <quote>.</quote> and <quote>*</quote> to
- denote any character, zero or more times. In other words, any string at all.
- So we start with a literal forward slash, then our regular expression pattern
- (<quote>.*</quote>) another literal forward slash, the string
- <quote>banners</quote>, another forward slash, and lastly another
- <quote>.*</quote>. We are building
- a directory path here. This will match any file with the path that has a
- directory named <quote>banners</quote> in it. The <quote>.*</quote> matches
- any characters, and this could conceivably be more forward slashes, so it
- might expand into a much longer looking path. For example, this could match:
- <quote>/eye/hate/spammers/banners/annoy_me_please.gif</quote>, or just
- <quote>/banners/annoying.html</quote>, or almost an infinite number of other
- possible combinations, just so it has <quote>banners</quote> in the path
- somewhere.
-</para>
-
-<para>
- And now something a little more complex:
-</para>
-
-<para>
- <emphasis><literal>/.*/adv((er)?ts?|ertis(ing|ements?))?/</literal></emphasis> -
- We have several literal forward slashes again (<quote>/</quote>), so we are
- building another expression that is a file path statement. We have another
- <quote>.*</quote>, so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that <emphasis>must
- match</emphasis> our pattern is <application>adv</application>, together with
- the forward slashes. What comes after the <quote>adv</quote> string is the
- interesting part.
-</para>
-
-<para>
- Remember the <quote>?</quote> means the preceding expression (either a
- literal character or anything grouped with <quote>(...)</quote> in this case)
- can exist or not, since this means either zero or one match. So
- <quote>((er)?ts?|ertis(ing|ements?))</quote> is optional, as are the
- individual sub-expressions: <quote>(er)</quote>,
- <quote>(ing|ements?)</quote>, and the <quote>s</quote>. The <quote>|</quote>
- means <quote>or</quote>. We have two of those. For instance,
- <quote>(ing|ements?)</quote>, can expand to match either <quote>ing</quote>
- <emphasis>OR</emphasis> <quote>ements?</quote>. What is being done here, is an
- attempt at matching as many variations of <quote>advertisement</quote>, and
- similar, as possible. So this would expand to match just <quote>adv</quote>,
- or <quote>advert</quote>, or <quote>adverts</quote>, or
- <quote>advertising</quote>, or <quote>advertisement</quote>, or
- <quote>advertisements</quote>. You get the idea. But it would not match
- <quote>advertizements</quote> (with a <quote>z</quote>). We could fix that by
- changing our regular expression to:
- <quote>/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/</quote>, which would then match
- either spelling.
-</para>
-
-<para>
- <emphasis><literal>/.*/advert[0-9]+\.(gif|jpe?g)</literal></emphasis> - Again
- another path statement with forward slashes. Anything in the square brackets
- <quote>[ ]</quote> can be matched. This is using <quote>0-9</quote> as a
- shorthand expression to mean any digit one through nine. It is the same as
- saying <quote>0123456789</quote>. So any digit matches. The <quote>+</quote>
- means one or more of the preceding expression must be included. The preceding
- expression here is what is in the square brackets -- in this case, any digit
- one through nine. Then, at the end, we have a grouping: <quote>(gif|jpe?g)</quote>.
- This includes a <quote>|</quote>, so this needs to match the expression on
- either side of that bar character also. A simple <quote>gif</quote> on one side, and the other
- side will in turn match either <quote>jpeg</quote> or <quote>jpg</quote>,
- since the <quote>?</quote> means the letter <quote>e</quote> is optional and
- can be matched once or not at all. So we are building an expression here to
- match image GIF or JPEG type image file. It must include the literal
- string <quote>advert</quote>, then one or more digits, and a <quote>.</quote>
- (which is now a literal, and not a special character, since it is escaped
- with <quote>\</quote>), and lastly either <quote>gif</quote>, or
- <quote>jpeg</quote>, or <quote>jpg</quote>. Some possible matches would
- include: <quote>//advert1.jpg</quote>,
- <quote>/nasty/ads/advert1234.gif</quote>,
- <quote>/banners/from/hell/advert99.jpg</quote>. It would not match
- <quote>advert1.gif</quote> (no leading slash), or
- <quote>/adverts232.jpg</quote> (the expression does not include an
- <quote>s</quote>), or <quote>/advert1.jsp</quote> (<quote>jsp</quote> is not
- in the expression anywhere).
-</para>
-
-<para>
- We are barely scratching the surface of regular expressions here so that you
- can understand the default <application>Privoxy</application>
- configuration files, and maybe use this knowledge to customize your own
- installation. There is much, much more that can be done with regular
- expressions. Now that you know enough to get started, you can learn more on
- your own :/
-</para>
-
-<para>
- More reading on Perl Compatible Regular expressions:
- <ulink url="http://perldoc.perl.org/perlre.html">http://perldoc.perl.org/perlre.html</ulink>
-</para>
-
-<para>
- For information on regular expression based substitutions and their applications
- in filters, please see the <link linkend="filter-file">filter file tutorial</link>
- in this manual.
-</para>
-</sect2>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2>
-<title>Privoxy's Internal Pages</title>
-
-<para>
- Since <application>Privoxy</application> proxies each requested
- web page, it is easy for <application>Privoxy</application> to
- trap certain special URLs. In this way, we can talk directly to
- <application>Privoxy</application>, and see how it is
- configured, see how our rules are being applied, change these
- rules and other configuration options, and even turn
- <application>Privoxy's</application> filtering off, all with
- a web browser.
-
-</para>
-
-<para>
- The URLs listed below are the special ones that allow direct access
- to <application>Privoxy</application>. Of course,
- <application>Privoxy</application> must be running to access these. If
- not, you will get a friendly error message. Internet access is not
- necessary either.
-</para>
-
-<para>
- <itemizedlist>
-
- <listitem>
- <para>
- Privoxy main page:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/">http://config.privoxy.org/</ulink>
- </para>
- </blockquote>
- <para>
- There is a shortcut: <ulink url="http://p.p/">http://p.p/</ulink> (But it
- doesn't provide a fall-back to a real page, in case the request is not
- sent through <application>Privoxy</application>)
- </para>
- </listitem>
-
- <listitem>
- <para>
- Show information about the current configuration, including viewing and
- editing of actions files:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-status">http://config.privoxy.org/show-status</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show the source code version numbers:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-version">http://config.privoxy.org/show-version</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show the browser's request headers:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-request">http://config.privoxy.org/show-request</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show which actions apply to a URL and why:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-url-info">http://config.privoxy.org/show-url-info</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Toggle Privoxy on or off. This feature can be turned off/on in the main
- <filename>config</filename> file. When toggled <quote>off</quote>, <quote>Privoxy</quote>
- continues to run, but only as a pass-through proxy, with no actions taking
- place:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle">http://config.privoxy.org/toggle</ulink>
- </para>
- </blockquote>
- <para>
- Short cuts. Turn off, then on:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle?set=disable">http://config.privoxy.org/toggle?set=disable</ulink>
- </para>
- </blockquote>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle?set=enable">http://config.privoxy.org/toggle?set=enable</ulink>
- </para>
- </blockquote>
- </listitem>
-
- </itemizedlist>
-</para>
-
-<para>
- These may be bookmarked for quick reference. See next.
-
-</para>
-
-<sect3 id="bookmarklets">
-<title>Bookmarklets</title>
-<para>
- Below are some <quote>bookmarklets</quote> to allow you to easily access a
- <quote>mini</quote> version of some of <application>Privoxy's</application>
- special pages. They are designed for MS Internet Explorer, but should work
- equally well in Netscape, Mozilla, and other browsers which support
- JavaScript. They are designed to run directly from your bookmarks - not by
- clicking the links below (although that should work for testing).
-</para>
-<para>
- To save them, right-click the link and choose <quote>Add to Favorites</quote>
- (IE) or <quote>Add Bookmark</quote> (Netscape). You will get a warning that
- the bookmark <quote>may not be safe</quote> - just click OK. Then you can run the
- Bookmarklet directly from your favorites/bookmarks. For even faster access,
- you can put them on the <quote>Links</quote> bar (IE) or the <quote>Personal
- Toolbar</quote> (Netscape), and run them with a single click.
-</para>
-
-<para>
- <itemizedlist>
-
- <listitem>
- <para>
- <ulink
- url="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=enabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Privoxy - Enable</ulink>
- </para>
- </listitem>
-
- <listitem>
- <para>
- <ulink
- url="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=disabled','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Privoxy - Disable</ulink>
- </para>
- </listitem>
-
- <listitem>
- <para>
- <ulink
- url="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y&set=toggle','ijbstatus','width=250,height=100,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Privoxy - Toggle Privoxy</ulink> (Toggles between enabled and disabled)
- </para>
- </listitem>
-
- <listitem>
- <para>
- <ulink
- url="javascript:void(window.open('http://config.privoxy.org/toggle?mini=y','ijbstatus','width=250,height=2,resizable=yes,scrollbars=no,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Privoxy- View Status</ulink>
- </para>
- </listitem>
-<!--
- <listitem>
- <para>
- <ulink url="javascript:w=Math.floor(screen.width/2);h=Math.floor(screen.height*0.9);void(window.open('http://www.privoxy.org/actions/index.php?url='+escape(location.href),'Feedback','screenx='+w+',width='+w+',height='+h+',scrollbars=yes,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Privoxy - Submit Actions File Feedback</ulink>
- </para>
- </listitem>
- -->
- <listitem>
- <para>
- <ulink url="javascript:void(window.open('http://config.privoxy.org/show-url-info?url='+escape(location.href),'Why').focus());">Privoxy - Why?</ulink>
- </para>
- </listitem>
- </itemizedlist>
-</para>
-
-<para>
- Credit: The site which gave us the general idea for these bookmarklets is
- <ulink url="http://www.bookmarklets.com/">www.bookmarklets.com</ulink>. They
- have more information about bookmarklets.
-</para>
-
-
-</sect3>
-
-</sect2>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="chain">
-<title>Chain of Events</title>
-<para>
- Let's take a quick look at how some of <application>Privoxy's</application>
- core features are triggered, and the ensuing sequence of events when a web
- page is requested by your browser:
-</para>
-
-<para>
- <itemizedlist>
- <listitem>
- <para>
- First, your web browser requests a web page. The browser knows to send
- the request to <application>Privoxy</application>, which will in turn,
- relay the request to the remote web server after passing the following
- tests:
- </para>
- </listitem>
- <listitem>
- <para>
- <application>Privoxy</application> traps any request for its own internal CGI
- pages (e.g <ulink url="http://p.p/">http://p.p/</ulink>) and sends the CGI page back to the browser.
- </para>
- </listitem>
- <listitem>
- <para>
- Next, <application>Privoxy</application> checks to see if the URL
- matches any <link
- linkend="BLOCK"><quote>+block</quote></link> patterns. If
- so, the URL is then blocked, and the remote web server will not be contacted.
- <link linkend="HANDLE-AS-IMAGE"><quote>+handle-as-image</quote></link>
- and
- <link linkend="HANDLE-AS-EMPTY-DOCUMENT"><quote>+handle-as-empty-document</quote></link>
- are then checked, and if there is no match, an
- HTML <quote>BLOCKED</quote> page is sent back to the browser. Otherwise, if
- it does match, an image is returned for the former, and an empty text
- document for the latter. The type of image would depend on the setting of
- <link linkend="SET-IMAGE-BLOCKER"><quote>+set-image-blocker</quote></link>
- (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
- </para>
- </listitem>
- <listitem>
- <para>
- Untrusted URLs are blocked. If URLs are being added to the
- <filename>trust</filename> file, then that is done.
- </para>
- </listitem>
- <listitem>
- <para>
- If the URL pattern matches the <link
- linkend="FAST-REDIRECTS"><quote>+fast-redirects</quote></link> action,
- it is then processed. Unwanted parts of the requested URL are stripped.
- </para>
- </listitem>
- <listitem>
- <para>
- Now the rest of the client browser's request headers are processed. If any
- of these match any of the relevant actions (e.g. <link
- linkend="HIDE-USER-AGENT"><quote>+hide-user-agent</quote></link>,
- etc.), headers are suppressed or forged as determined by these actions and
- their parameters.
- </para>
- </listitem>
- <listitem>
- <para>
- Now the web server starts sending its response back (i.e. typically a web
- page).
- </para>
- </listitem>
- <listitem>
- <para>
- First, the server headers are read and processed to determine, among other
- things, the MIME type (document type) and encoding. The headers are then
- filtered as determined by the
- <link linkend="CRUNCH-INCOMING-COOKIES"><quote>+crunch-incoming-cookies</quote></link>,
- <link linkend="SESSION-COOKIES-ONLY"><quote>+session-cookies-only</quote></link>,
- and <link linkend="DOWNGRADE-HTTP-VERSION"><quote>+downgrade-http-version</quote></link>
- actions.
- </para>
- </listitem>
- <listitem>
- <para>
- If any <link linkend="FILTER"><quote>+filter</quote></link> action
- or <link
- linkend="DEANIMATE-GIFS"><quote>+deanimate-gifs</quote></link>
- action applies (and the document type fits the action), the rest of the page is
- read into memory (up to a configurable limit). Then the filter rules (from
- <filename>default.filter</filename> and any other filter files) are
- processed against the buffered content. Filters are applied in the order
- they are specified in one of the filter files. Animated GIFs, if present,
- are reduced to either the first or last frame, depending on the action
- setting.The entire page, which is now filtered, is then sent by
- <application>Privoxy</application> back to your browser.
- </para>
- <para>
- If neither a <link linkend="FILTER"><quote>+filter</quote></link> action
- or <link
- linkend="DEANIMATE-GIFS"><quote>+deanimate-gifs</quote></link>
- matches, then <application>Privoxy</application> passes the raw data through
- to the client browser as it becomes available.
- </para>
- </listitem>
- <listitem>
- <para>
- As the browser receives the now (possibly filtered) page content, it
- reads and then requests any URLs that may be embedded within the page
- source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
- frames), sounds, etc. For each of these objects, the browser issues a
- separate request (this is easily viewable in <application>Privoxy's</application>
- logs). And each such request is in turn processed just as above. Note that a
- complex web page will have many, many such embedded URLs. If these
- secondary requests are to a different server, then quite possibly a very
- differing set of actions is triggered.
- </para>
- </listitem>
-
- </itemizedlist>
-</para>
-<para>
- NOTE: This is somewhat of a simplistic overview of what happens with each URL
- request. For the sake of brevity and simplicity, we have focused on
- <application>Privoxy's</application> core features only.
-</para>
-
-</sect2>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="actionsanat">
-<title>Troubleshooting: Anatomy of an Action</title>
-
-<para>
- The way <application>Privoxy</application> applies
- <link linkend="ACTIONS">actions</link> and <link linkend="FILTER">filters</link>
- to any given URL can be complex, and not always so
- easy to understand what is happening. And sometimes we need to be able to
- <emphasis>see</emphasis> just what <application>Privoxy</application> is
- doing. Especially, if something <application>Privoxy</application> is doing
- is causing us a problem inadvertently. It can be a little daunting to look at
- the actions and filters files themselves, since they tend to be filled with
- <link linkend="regex">regular expressions</link> whose consequences are not
- always so obvious.
-</para>
-
-<para>
- One quick test to see if <application>Privoxy</application> is causing a problem
- or not, is to disable it temporarily. This should be the first troubleshooting
- step. See <link linkend="bookmarklets">the Bookmarklets</link> section on a quick
- and easy way to do this (be sure to flush caches afterward!). Looking at the
- logs is a good idea too. (Note that both the toggle feature and logging are
- enabled via <filename>config</filename> file settings, and may need to be
- turned <quote>on</quote>.)
-</para>
-<para>
- Another easy troubleshooting step to try is if you have done any
- customization of your installation, revert back to the installed
- defaults and see if that helps. There are times the developers get complaints
- about one thing or another, and the problem is more related to a customized
- configuration issue.
-</para>
-
-<para>
- <application>Privoxy</application> also provides the
- <ulink url="http://config.privoxy.org/show-url-info">http://config.privoxy.org/show-url-info</ulink>
- page that can show us very specifically how <application>actions</application>
- are being applied to any given URL. This is a big help for troubleshooting.
-</para>
-
-<para>
- First, enter one URL (or partial URL) at the prompt, and then
- <application>Privoxy</application> will tell us
- how the current configuration will handle it. This will not
- help with filtering effects (i.e. the <link
- linkend="FILTER"><quote>+filter</quote></link> action) from
- one of the filter files since this is handled very
- differently and not so easy to trap! It also will not tell you about any other
- URLs that may be embedded within the URL you are testing. For instance, images
- such as ads are expressed as URLs within the raw page source of HTML pages. So
- you will only get info for the actual URL that is pasted into the prompt area
- -- not any sub-URLs. If you want to know about embedded URLs like ads, you
- will have to dig those out of the HTML source. Use your browser's <quote>View
- Page Source</quote> option for this. Or right click on the ad, and grab the
- URL.
-</para>
-
-<para>
- Let's try an example, <ulink url="http://google.com">google.com</ulink>,
- and look at it one section at a time in a sample configuration (your real
- configuration may vary):
-</para>
-
-<para>
- <screen>
- Matches for http://www.google.com:
-
- In file: default.action <guibutton>[ View ]</guibutton> <guibutton>[ Edit ]</guibutton>
-
- {+change-x-forwarded-for{block}
- +deanimate-gifs {last}
- +fast-redirects {check-decoded-url}
- +filter {refresh-tags}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- +hide-from-header {block}
- +hide-referrer {forge}
- +session-cookies-only
- +set-image-blocker {pattern}
-/
-
- { -session-cookies-only }
- .google.com
-
- { -fast-redirects }
- .google.com
-
-In file: user.action <guibutton>[ View ]</guibutton> <guibutton>[ Edit ]</guibutton>
-(no matches in this file)
-</screen>
-</para>
-
-<para>
- This is telling us how we have defined our
- <link linkend="ACTIONS"><quote>actions</quote></link>, and
- which ones match for our test case, <quote>google.com</quote>.
- Displayed is all the actions that are available to us. Remember,
- the <literal>+</literal> sign denotes <quote>on</quote>. <literal>-</literal>
- denotes <quote>off</quote>. So some are <quote>on</quote> here, but many
- are <quote>off</quote>. Each example we try may provide a slightly different
- end result, depending on our configuration directives.
-</para>
-<para>
- The first listing
- is for our <filename>default.action</filename> file. The large, multi-line
- listing, is how the actions are set to match for all URLs, i.e. our default
- settings. If you look at your <quote>actions</quote> file, this would be the
- section just below the <quote>aliases</quote> section near the top. This
- will apply to all URLs as signified by the single forward slash at the end
- of the listing -- <quote> / </quote>.
-</para>
-
-<para>
- But we have defined additional actions that would be exceptions to these general
- rules, and then we list specific URLs (or patterns) that these exceptions
- would apply to. Last match wins. Just below this then are two explicit
- matches for <quote>.google.com</quote>. The first is negating our previous
- cookie setting, which was for <link
- linkend="SESSION-COOKIES-ONLY"><quote>+session-cookies-only</quote></link>
- (i.e. not persistent). So we will allow persistent cookies for google, at
- least that is how it is in this example. The second turns
- <emphasis>off</emphasis> any <link
- linkend="FAST-REDIRECTS"><quote>+fast-redirects</quote></link>
- action, allowing this to take place unmolested. Note that there is a leading
- dot here -- <quote>.google.com</quote>. This will match any hosts and
- sub-domains, in the google.com domain also, such as
- <quote>www.google.com</quote> or <quote>mail.google.com</quote>. But it would not
- match <quote>www.google.de</quote>! So, apparently, we have these two actions
- defined as exceptions to the general rules at the top somewhere in the lower
- part of our <filename>default.action</filename> file, and
- <quote>google.com</quote> is referenced somewhere in these latter sections.
-</para>
-
-<para>
- Then, for our <filename>user.action</filename> file, we again have no hits.
- So there is nothing google-specific that we might have added to our own, local
- configuration. If there was, those actions would over-rule any actions from
- previously processed files, such as <filename>default.action</filename>.
- <filename>user.action</filename> typically has the last word. This is the
- best place to put hard and fast exceptions,
-</para>
-
-<para>
- And finally we pull it all together in the bottom section and summarize how
- <application>Privoxy</application> is applying all its <quote>actions</quote>
- to <quote>google.com</quote>:
-
-</para>
-
-<para>
- <screen>
-
- Final results:
-
- -add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs {last}
- -downgrade-http-version
- -fast-redirects
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header {block}
- -hide-if-modified-since
- +hide-referrer {forge}
- -hide-user-agent
- -limit-connect
- -overwrite-last-modified
- -prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- -session-cookies-only
- +set-image-blocker {pattern} </screen>
-</para>
-
-<para>
- Notice the only difference here to the previous listing, is to
- <quote>fast-redirects</quote> and <quote>session-cookies-only</quote>,
- which are activated specifically for this site in our configuration,
- and thus show in the <quote>Final Results</quote>.
-</para>
-
-<para>
- Now another example, <quote>ad.doubleclick.net</quote>:
-</para>
-
-<para>
- <screen>
-
- { +block{Domains starts with "ad"} }
- ad*.
-
- { +block{Domain contains "ad"} }
- .ad.
-
- { +block{Doubleclick banner server} +handle-as-image }
- .[a-vx-z]*.doubleclick.net
-</screen>
-</para>
-
-<para>
- We'll just show the interesting part here - the explicit matches. It is
- matched three different times. Two <quote>+block{}</quote> sections,
- and a <quote>+block{} +handle-as-image</quote>,
- which is the expanded form of one of our aliases that had been defined as:
- <quote>+block-as-image</quote>. (<link
- linkend="ALIASES"><quote>Aliases</quote></link> are defined in
- the first section of the actions file and typically used to combine more
- than one action.)
-</para>
-
-<para>
- Any one of these would have done the trick and blocked this as an unwanted
- image. This is unnecessarily redundant since the last case effectively
- would also cover the first. No point in taking chances with these guys
- though ;-) Note that if you want an ad or obnoxious
- URL to be invisible, it should be defined as <quote>ad.doubleclick.net</quote>
- is done here -- as both a <link
- linkend="BLOCK"><quote>+block{}</quote></link>
- <emphasis>and</emphasis> an
- <link linkend="HANDLE-AS-IMAGE"><quote>+handle-as-image</quote></link>.
- The custom alias <quote><literal>+block-as-image</literal></quote> just
- simplifies the process and make it more readable.
-</para>
-
-<para>
- One last example. Let's try <quote>http://www.example.net/adsl/HOWTO/</quote>.
- This one is giving us problems. We are getting a blank page. Hmmm ...
-</para>
-
-<para>
- <screen>
-
- Matches for http://www.example.net/adsl/HOWTO/:
-
- In file: default.action <guibutton>[ View ]</guibutton> <guibutton>[ Edit ]</guibutton>
-
- {-add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs
- -downgrade-http-version
- +fast-redirects {check-decoded-url}
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header{block}
- +hide-referer{forge}
- -hide-user-agent
- -overwrite-last-modified
- +prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- +session-cookies-only
- +set-image-blocker{blank} }
- /
-
- { +block{Path contains "ads".} +handle-as-image }
- /ads
-</screen>
-</para>
-
-<para>
- Ooops, the <quote>/adsl/</quote> is matching <quote>/ads</quote> in our
- configuration! But we did not want this at all! Now we see why we get the
- blank page. It is actually triggering two different actions here, and
- the effects are aggregated so that the URL is blocked, and &my-app; is told
- to treat the block as if it were an image. But this is, of course, all wrong.
- We could now add a new action below this (or better in our own
- <filename>user.action</filename> file) that explicitly
- <emphasis>un</emphasis> blocks (
- <link linkend="BLOCK"><quote>{-block}</quote></link>) paths with
- <quote>adsl</quote> in them (remember, last match in the configuration
- wins). There are various ways to handle such exceptions. Example:
-</para>
-
-<para>
- <screen>
-
- { -block }
- /adsl
-</screen>
-</para>
-
-<para>
- Now the page displays ;-)
- Remember to flush your browser's caches when making these kinds of changes to
- your configuration to insure that you get a freshly delivered page! Or, try
- using <literal>Shift+Reload</literal>.
-</para>
-
-<para>
- But now what about a situation where we get no explicit matches like
- we did with:
-</para>
-
-<para>
- <screen>
-
- { +block{Path starts with "ads".} +handle-as-image }
- /ads
-</screen>
-</para>