- but the aliases themselves are lost when you edit sections that use aliases
- with it.
-</para>
-
-<para>
- Now let's define some aliases...
-</para>
-
-<para>
- <screen>
- # Useful custom aliases we can use later.
- #
- # Note the (required!) section header line and that this section
- # must be at the top of the actions file!
- #
- {{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = +<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> +<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- -crunch-all-cookies = -<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> -<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- +block-as-image = +block{Blocked image.} +handle-as-image
- allow-all-cookies = -crunch-all-cookies -<link linkend="SESSION-COOKIES-ONLY">session-cookies-only</link> -<link linkend="FILTER-CONTENT-COOKIES">filter{content-cookies}</link>
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = -<link linkend="BLOCK">block</link> -<link linkend="FILTER">filter</link> -crunch-all-cookies -<link linkend="FAST-REDIRECTS">fast-redirects</link> -<link linkend="HIDE-REFERER">hide-referrer</link> -<link linkend="PREVENT-COMPRESSION">prevent-compression</link>
-
- shop = -crunch-all-cookies -<link linkend="FILTER-ALL-POPUPS">filter{all-popups}</link>
-
- # Short names for other aliases, for really lazy people ;-)
- #
- c0 = +crunch-all-cookies
- c1 = -crunch-all-cookies</screen>
-</para>
-
-<para>
- ...and put them to use. These sections would appear in the lower part of an
- actions file and define exceptions to the default actions (as specified further
- up for the <quote>/</quote> pattern):
-</para>
-
-<para>
- <screen>
- # These sites are either very complex or very keen on
- # user data and require minimal interference to work:
- #
- {fragile}
- .office.microsoft.com
- .windowsupdate.microsoft.com
- # Gmail is really mail.google.com, not gmail.com
- mail.google.com
-
- # Shopping sites:
- # Allow cookies (for setting and retrieving your customer data)
- #
- {shop}
- .quietpc.com
- .worldpay.com # for quietpc.com
- mybank.example.com
-
- # These shops require pop-ups:
- #
- {-filter{all-popups} -filter{unsolicited-popups}}
- .dabs.com
- .overclockers.co.uk</screen>
-</para>
-
-<para>
- Aliases like <quote>shop</quote> and <quote>fragile</quote> are typically used for
- <quote>problem</quote> sites that require more than one action to be disabled
- in order to function properly.
-</para>
-</sect2>
-<!--
-hal stop here
--->
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="act-examples">
-<title>Actions Files Tutorial</title>
-<para>
- The above chapters have shown <link linkend="actions-file">which actions files
- there are and how they are organized</link>, how actions are <link
- linkend="actions">specified</link> and <link linkend="actions-apply">applied
- to URLs</link>, how <link linkend="af-patterns">patterns</link> work, and how to
- define and use <link linkend="aliases">aliases</link>. Now, let's look at an
- example <filename>match-all.action</filename>, <filename>default.action</filename>
- and <filename>user.action</filename> file and see how all these pieces come together:
-</para>
-
-<sect3>
-<title>match-all.action</title>
-<para>
- Remember <emphasis>all actions are disabled when matching starts</emphasis>,
- so we have to explicitly enable the ones we want.
-</para>
-
-<para>
- While the <filename>match-all.action</filename> file only contains a
- single section, it is probably the most important one. It has only one
- pattern, <quote><literal>/</literal></quote>, but this pattern
- <link linkend="af-patterns">matches all URLs</link>. Therefore, the set of
- actions used in this <quote>default</quote> section <emphasis>will
- be applied to all requests as a start</emphasis>. It can be partly or
- wholly overridden by other actions files like <filename>default.action</filename>
- and <filename>user.action</filename>, but it will still be largely responsible
- for your overall browsing experience.
-</para>
-
-<para>
- Again, at the start of matching, all actions are disabled, so there is
- no need to disable any actions here. (Remember: a <quote>+</quote>
- preceding the action name enables the action, a <quote>-</quote> disables!).
- Also note how this long line has been made more readable by splitting it into
- multiple lines with line continuation.
-</para>
-
-<para>
- <screen>
-{ \
- +<link linkend="CHANGE-X-FORWARDED-FOR">change-x-forwarded-for{block}</link> \
- +<link linkend="HIDE-FROM-HEADER">hide-from-header{block}</link> \
- +<link linkend="SET-IMAGE-BLOCKER">set-image-blocker{pattern}</link> \
-}
-/ # Match all URLs
- </screen>
-</para>
-
-<para>
- The default behavior is now set.
-</para>
-</sect3>
-
-<sect3>
-<title>default.action</title>
-
-<para>
- If you aren't a developer, there's no need for you to edit the
- <filename>default.action</filename> file. It is maintained by
- the &my-app; developers and if you disagree with some of the
- sections, you should overrule them in your <filename>user.action</filename>.
-</para>
-
-<para>
- Understanding the <filename>default.action</filename> file can
- help you with your <filename>user.action</filename>, though.
-</para>
-
-<para>
- The first section in this file is a special section for internal use
- that prevents older &my-app; versions from reading the file:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Settings -- Don't change! For internal Privoxy use ONLY.
-##########################################################################
-{{settings}}
-for-privoxy-version=3.0.11</screen>
-</para>
-
-<para>
- After that comes the (optional) alias section. We'll use the example
- section from the above <link linkend="aliases">chapter on aliases</link>,
- that also explains why and how aliases are used:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Aliases
-##########################################################################
-{{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = +<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> +<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- -crunch-all-cookies = -<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> -<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- +block-as-image = +block{Blocked image.} +handle-as-image
- mercy-for-cookies = -crunch-all-cookies -<link linkend="SESSION-COOKIES-ONLY">session-cookies-only</link> -<link linkend="FILTER-CONTENT-COOKIES">filter{content-cookies}</link>
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = -<link linkend="BLOCK">block</link> -<link linkend="FILTER">filter</link> -crunch-all-cookies -<link linkend="FAST-REDIRECTS">fast-redirects</link> -<link linkend="HIDE-REFERER">hide-referrer</link>
- shop = -crunch-all-cookies -<link linkend="FILTER-ALL-POPUPS">filter{all-popups}</link></screen>
-</para>
-
-<para>
- The first of our specialized sections is concerned with <quote>fragile</quote>
- sites, i.e. sites that require minimum interference, because they are either
- very complex or very keen on tracking you (and have mechanisms in place that
- make them unusable for people who avoid being tracked). We will simply use
- our pre-defined <literal>fragile</literal> alias instead of stating the list
- of actions explicitly:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Exceptions for sites that'll break under the default action set:
-##########################################################################
-
-# "Fragile" Use a minimum set of actions for these sites (see alias above):
-#
-{ fragile }
-.office.microsoft.com # surprise, surprise!
-.windowsupdate.microsoft.com
-mail.google.com</screen>
-</para>
-
-<para>
- Shopping sites are not as fragile, but they typically
- require cookies to log in, and pop-up windows for shopping
- carts or item details. Again, we'll use a pre-defined alias:
-</para>
-
-<para>
- <screen>
-# Shopping sites:
-#
-{ shop }
-.quietpc.com
-.worldpay.com # for quietpc.com
-.jungle.com
-.scan.co.uk</screen>
-</para>
-
-<para>
- The <literal><link linkend="FAST-REDIRECTS">fast-redirects</link></literal>
- action, which may have been enabled in <filename>match-all.action</filename>,
- breaks some sites. So disable it for popular sites where we know it misbehaves:
-</para>
-
-<para>
- <screen>
-{ -<link linkend="FAST-REDIRECTS">fast-redirects</link> }
-login.yahoo.com
-edit.*.yahoo.com
-.google.com
-.altavista.com/.*(like|url|link):http
-.altavista.com/trans.*urltext=http
-.nytimes.com</screen>
-</para>
-
-<para>
- It is important that <application>Privoxy</application> knows which
- URLs belong to images, so that <emphasis>if</emphasis> they are to
- be blocked, a substitute image can be sent, rather than an HTML page.
- Contacting the remote site to find out is not an option, since it
- would destroy the loading time advantage of banner blocking, and it
- would feed the advertisers information about you. We can mark any
- URL as an image with the <literal><link
- linkend="handle-as-image">handle-as-image</link></literal> action,
- and marking all URLs that end in a known image file extension is a
- good start:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Images:
-##########################################################################
-
-# Define which file types will be treated as images, in case they get
-# blocked further down this file:
-#
-{ +<link linkend="HANDLE-AS-IMAGE">handle-as-image</link> }
-/.*\.(gif|jpe?g|png|bmp|ico)$</screen>
-</para>
-
-<para>
- And then there are known banner sources. They often use scripts to
- generate the banners, so it won't be visible from the URL that the
- request is for an image. Hence we block them <emphasis>and</emphasis>
- mark them as images in one go, with the help of our
- <literal>+block-as-image</literal> alias defined above. (We could of
- course just as well use <literal>+<link linkend="block">block</link>
- +<link linkend="handle-as-image">handle-as-image</link></literal> here.)
- Remember that the type of the replacement image is chosen by the
- <literal><link linkend="set-image-blocker">set-image-blocker</link></literal>
- action. Since all URLs have matched the default section with its
- <literal>+<link linkend="set-image-blocker">set-image-blocker</link>{pattern}</literal>
- action before, it still applies and needn't be repeated:
-</para>
-
-<para>
- <screen>
-# Known ad generators:
-#
-{ +block-as-image }
-ar.atwola.com
-.ad.doubleclick.net
-.ad.*.doubleclick.net
-.a.yimg.com/(?:(?!/i/).)*$
-.a[0-9].yimg.com/(?:(?!/i/).)*$
-bs*.gsanet.com
-.qkimg.net</screen>
-</para>
-
-<para>
- One of the most important jobs of <application>Privoxy</application>
- is to block banners. Many of these can be <quote>blocked</quote>
- by the <literal><link linkend="filter">filter</link>{banners-by-size}</literal>
- action, which we enabled above, and which deletes the references to banner
- images from the pages while they are loaded, so the browser doesn't request
- them anymore, and hence they don't need to be blocked here. But this naturally
- doesn't catch all banners, and some people choose not to use filters, so we
- need a comprehensive list of patterns for banner URLs here, and apply the
- <literal><link linkend="block">block</link></literal> action to them.
-</para>
-<para>
- First comes many generic patterns, which do most of the work, by
- matching typical domain and path name components of banners. Then comes
- a list of individual patterns for specific sites, which is omitted here
- to keep the example short:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Block these fine banners:
-##########################################################################
-{ <link linkend="BLOCK">+block{Banner ads.}</link> }
-
-# Generic patterns:
-#
-ad*.
-.*ads.
-banner?.
-count*.
-/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
-/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
-
-# Site-specific patterns (abbreviated):
-#
-.hitbox.com</screen>
-</para>
-
-<para>
- It's quite remarkable how many advertisers actually call their banner
- servers ads.<replaceable>company</replaceable>.com, or call the directory
- in which the banners are stored simply <quote>banners</quote>. So the above
- generic patterns are surprisingly effective.
-</para>
-<para>
- But being very generic, they necessarily also catch URLs that we don't want
- to block. The pattern <literal>.*ads.</literal> e.g. catches
- <quote>nasty-<emphasis>ads</emphasis>.nasty-corp.com</quote> as intended,
- but also <quote>downlo<emphasis>ads</emphasis>.sourcefroge.net</quote> or
- <quote><emphasis>ads</emphasis>l.some-provider.net.</quote> So here come some
- well-known exceptions to the <literal>+<link linkend="BLOCK">block</link></literal>
- section above.
-</para>
-<para>
- Note that these are exceptions to exceptions from the default! Consider the URL
- <quote>downloads.sourcefroge.net</quote>: Initially, all actions are deactivated,
- so it wouldn't get blocked. Then comes the defaults section, which matches the
- URL, but just deactivates the <literal><link linkend="BLOCK">block</link></literal>
- action once again. Then it matches <literal>.*ads.</literal>, an exception to the
- general non-blocking policy, and suddenly
- <literal><link linkend="BLOCK">+block</link></literal> applies. And now, it'll match
- <literal>.*loads.</literal>, where <literal><link linkend="BLOCK">-block</link></literal>
- applies, so (unless it matches <emphasis>again</emphasis> further down) it ends up
- with no <literal><link linkend="BLOCK">block</link></literal> action applying.
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Save some innocent victims of the above generic block patterns:
-##########################################################################
-
-# By domain:
-#
-{ -<link linkend="BLOCK">block</link> }
-adv[io]*. # (for advogato.org and advice.*)
-adsl. # (has nothing to do with ads)
-adobe. # (has nothing to do with ads either)
-ad[ud]*. # (adult.* and add.*)
-.edu # (universities don't host banners (yet!))
-.*loads. # (downloads, uploads etc)
-
-# By path:
-#
-/.*loads/
-
-# Site-specific:
-#
-www.globalintersec.com/adv # (adv = advanced)
-www.ugu.com/sui/ugu/adv</screen>
-</para>
-
-<para>
- Filtering source code can have nasty side effects,
- so make an exception for our friends at sourceforge.net,
- and all paths with <quote>cvs</quote> in them. Note that
- <literal>-<link linkend="FILTER">filter</link></literal>
- disables <emphasis>all</emphasis> filters in one fell swoop!
-</para>
-
-<para>
- <screen>
-# Don't filter code!
-#
-{ -<link linkend="FILTER">filter</link> }
-/(.*/)?cvs
-bugzilla.
-developer.
-wiki.
-.sourceforge.net</screen>
-</para>
-
-<para>
- The actual <filename>default.action</filename> is of course much more
- comprehensive, but we hope this example made clear how it works.
-</para>
-
-</sect3>
-
-<sect3><title>user.action</title>
-
-<para>
- So far we are painting with a broad brush by setting general policies,
- which would be a reasonable starting point for many people. Now,
- you might want to be more specific and have customized rules that
- are more suitable to your personal habits and preferences. These would
- be for narrowly defined situations like your ISP or your bank, and should
- be placed in <filename>user.action</filename>, which is parsed after all other
- actions files and hence has the last word, over-riding any previously
- defined actions. <filename>user.action</filename> is also a
- <emphasis>safe</emphasis> place for your personal settings, since
- <filename>default.action</filename> is actively maintained by the
- <application>Privoxy</application> developers and you'll probably want
- to install updated versions from time to time.
-</para>
-
-<para>
- So let's look at a few examples of things that one might typically do in
- <filename>user.action</filename>:
-</para>
-
-
-<!-- brief sample user.action here -->
-
-<para>
- <screen>
-# My user.action file. <fred@example.com></screen>
-</para>
-
-<para>
- As <link linkend="aliases">aliases</link> are local to the actions
- file that they are defined in, you can't use the ones from
- <filename>default.action</filename>, unless you repeat them here:
-</para>
-
-<para>
- <screen>
-# Aliases are local to the file they are defined in.
-# (Re-)define aliases for this file:
-#
-{{alias}}
-#
-# These aliases just save typing later, and the alias names should
-# be self explanatory.
-#
-+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
--crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
- allow-all-cookies = -crunch-all-cookies -session-cookies-only
- allow-popups = -filter{all-popups}
-+block-as-image = +block{Blocked as image.} +handle-as-image
--block-as-image = -block
-
-# These aliases define combinations of actions that are useful for
-# certain types of sites:
-#
-fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
-shop = -crunch-all-cookies allow-popups
-
-# Allow ads for selected useful free sites:
-#
-allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
-
-# Alias for specific file types that are text, but might have conflicting
-# MIME types. We want the browser to force these to be text documents.
-handle-as-text = -<link linkend="FILTER">filter</link> +-<link linkend="content-type-overwrite">content-type-overwrite{text/plain}</link> +-<link linkend="FORCE-TEXT-MODE">force-text-mode</link> -<link linkend="HIDE-CONTENT-DISPOSITION">hide-content-disposition</link></screen>
-
-</para>
-
-<para>
- Say you have accounts on some sites that you visit regularly, and
- you don't want to have to log in manually each time. So you'd like
- to allow persistent cookies for these sites. The
- <literal>allow-all-cookies</literal> alias defined above does exactly
- that, i.e. it disables crunching of cookies in any direction, and the
- processing of cookies to make them only temporary.
-</para>
-
-<para>
- <screen>
-{ allow-all-cookies }
- sourceforge.net
- .yahoo.com
- .msdn.microsoft.com
- .redhat.com</screen>
-</para>
-
-<para>
- Your bank is allergic to some filter, but you don't know which, so you disable them all:
-</para>
-
-<para>
- <screen>
-{ -<link linkend="FILTER">filter</link> }
- .your-home-banking-site.com</screen>
-</para>
-
-<para>
- Some file types you may not want to filter for various reasons:
-</para>
-
-<para>
- <screen>
-# Technical documentation is likely to contain strings that might
-# erroneously get altered by the JavaScript-oriented filters:
-#
-.tldp.org
-/(.*/)?selfhtml/
-
-# And this stupid host sends streaming video with a wrong MIME type,
-# so that Privoxy thinks it is getting HTML and starts filtering:
-#
-stupid-server.example.com/</screen>
-</para>
-
-<para>
- Example of a simple <link linkend="BLOCK">block</link> action. Say you've
- seen an ad on your favourite page on example.com that you want to get rid of.
- You have right-clicked the image, selected <quote>copy image location</quote>
- and pasted the URL below while removing the leading http://, into a
- <literal>{ +block{} }</literal> section. Note that <literal>{ +handle-as-image
- }</literal> need not be specified, since all URLs ending in
- <literal>.gif</literal> will be tagged as images by the general rules as set
- in default.action anyway:
-</para>
-
-<para>
- <screen>
-{ +<link linkend="BLOCK">block</link>{Nasty ads.} }
- www.example.com/nasty-ads/sponsor\.gif
- another.example.net/more/junk/here/</screen>
-</para>
-
-<para>
- The URLs of dynamically generated banners, especially from large banner
- farms, often don't use the well-known image file name extensions, which
- makes it impossible for <application>Privoxy</application> to guess
- the file type just by looking at the URL.
- You can use the <literal>+block-as-image</literal> alias defined above for
- these cases.
- Note that objects which match this rule but then turn out NOT to be an
- image are typically rendered as a <quote>broken image</quote> icon by the
- browser. Use cautiously.
-</para>
-
-<para>
- <screen>
-{ +block-as-image }
- .doubleclick.net
- .fastclick.net
- /Realmedia/ads/
- ar.atwola.com/</screen>
-</para>
-
-<para>
- Now you noticed that the default configuration breaks Forbes Magazine,
- but you were too lazy to find out which action is the culprit, and you
- were again too lazy to give <link linkend="contact">feedback</link>, so
- you just used the <literal>fragile</literal> alias on the site, and
- -- <emphasis>whoa!</emphasis> -- it worked. The <literal>fragile</literal>
- aliases disables those actions that are most likely to break a site. Also,
- good for testing purposes to see if it is <application>Privoxy</application>
- that is causing the problem or not. We later find other regular sites
- that misbehave, and add those to our personalized list of troublemakers:
-</para>
-
-<para>
-<screen>
-{ fragile }
- .forbes.com
- webmail.example.com
- .mybank.com</screen>
-</para>
-
-<para>
- You like the <quote>fun</quote> text replacements in <filename>default.filter</filename>,
- but it is disabled in the distributed actions file.
- So you'd like to turn it on in your private,
- update-safe config, once and for all:
-</para>
-
-<para>
-<screen>
-{ +<link linkend="filter-fun">filter{fun}</link> }
- / # For ALL sites!</screen>
-</para>
-
-<para>
- Note that the above is not really a good idea: There are exceptions
- to the filters in <filename>default.action</filename> for things that
- really shouldn't be filtered, like code on CVS->Web interfaces. Since
- <filename>user.action</filename> has the last word, these exceptions
- won't be valid for the <quote>fun</quote> filtering specified here.
-</para>
-
-<para>
- You might also worry about how your favourite free websites are
- funded, and find that they rely on displaying banner advertisements
- to survive. So you might want to specifically allow banners for those
- sites that you feel provide value to you:
-</para>
-
-<para>
-<screen>
-{ allow-ads }
- .sourceforge.net
- .slashdot.org
- .osdn.net</screen>
-</para>
-
-<para>
- Note that <literal>allow-ads</literal> has been aliased to
- <literal>-<link linkend="block">block</link></literal>,
- <literal>-<link linkend="filter-banners-by-size">filter{banners-by-size}</link></literal>, and
- <literal>-<link linkend="filter-banners-by-link">filter{banners-by-link}</link></literal> above.
-</para>
-
-<para>
- Invoke another alias here to force an over-ride of the MIME type <literal>
- application/x-sh</literal> which typically would open a download type
- dialog. In my case, I want to look at the shell script, and then I can save
- it should I choose to.
-</para>
-
-<para>
-<screen>
-{ handle-as-text }
- /.*\.sh$</screen>
-</para>
-
-<para>
- <filename>user.action</filename> is generally the best place to define
- exceptions and additions to the default policies of
- <filename>default.action</filename>. Some actions are safe to have their
- default policies set here though. So let's set a default policy to have a
- <quote>blank</quote> image as opposed to the checkerboard pattern for
- <emphasis>ALL</emphasis> sites. <quote>/</quote> of course matches all URL
- paths and patterns:
-</para>
-
-<para>
-<screen>
-{ +<link linkend="set-image-blocker">set-image-blocker{blank}</link> }
-/ # ALL sites</screen>
-</para>
-
-</sect3>
-</sect2>
-
-<!-- ~ End section ~ -->
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect1 id="filter-file">
-<title>Filter Files</title>
-
-<para>
- On-the-fly text substitutions need
- to be defined in a <quote>filter file</quote>. Once defined, they
- can then be invoked as an <quote>action</quote>.
-</para>
-
-<para>
- &my-app; supports three different filter actions:
- <literal><link linkend="filter">filter</link></literal> to
- rewrite the content that is send to the client,
- <literal><link linkend="client-header-filter">client-header-filter</link></literal>
- to rewrite headers that are send by the client, and
- <literal><link linkend="server-header-filter">server-header-filter</link></literal>
- to rewrite headers that are send by the server.
-</para>
-
-<para>
- &my-app; also supports two tagger actions:
- <literal><link linkend="client-header-tagger">client-header-tagger</link></literal>
- and
- <literal><link linkend="server-header-tagger">server-header-tagger</link></literal>.
- Taggers and filters use the same syntax in the filter files, the difference
- is that taggers don't modify the text they are filtering, but use a rewritten
- version of the filtered text as tag. The tags can then be used to change the
- applying actions through sections with <link linkend="tag-pattern">tag-patterns</link>.
-</para>
-
-
-<para>
- Multiple filter files can be defined through the <literal> <link
- linkend="filterfile">filterfile</link></literal> config directive. The filters
- as supplied by the developers are located in
- <filename>default.filter</filename>. It is recommended that any locally
- defined or modified filters go in a separately defined file such as
- <filename>user.filter</filename>.
- </para>
-
-<para>
- Common tasks for content filters are to eliminate common annoyances in
- HTML and JavaScript, such as pop-up windows,
- exit consoles, crippled windows without navigation tools, the
- infamous <BLINK> tag etc, to suppress images with certain
- width and height attributes (standard banner sizes or web-bugs),
- or just to have fun.
-</para>
-
-<para>
- Enabled content filters are applied to any content whose
- <quote>Content Type</quote> header is recognised as a sign
- of text-based content, with the exception of <literal>text/plain</literal>.
- Use the <link linkend="FORCE-TEXT-MODE">force-text-mode</link> action
- to also filter other content.
-</para>
-
-<para>
- Substitutions are made at the source level, so if you want to <quote>roll
- your own</quote> filters, you should first be familiar with HTML syntax,
- and, of course, regular expressions.
-</para>
-
-<para>
- Just like the <link linkend="actions-file">actions files</link>, the
- filter file is organized in sections, which are called <emphasis>filters</emphasis>
- here. Each filter consists of a heading line, that starts with one of the
- <emphasis>keywords</emphasis> <literal>FILTER:</literal>,
- <literal>CLIENT-HEADER-FILTER:</literal> or <literal>SERVER-HEADER-FILTER:</literal>
- followed by the filter's <emphasis>name</emphasis>, and a short (one line)
- <emphasis>description</emphasis> of what it does. Below that line
- come the <emphasis>jobs</emphasis>, i.e. lines that define the actual
- text substitutions. By convention, the name of a filter
- should describe what the filter <emphasis>eliminates</emphasis>. The
- comment is used in the <ulink url="http://config.privoxy.org/">web-based
- user interface</ulink>.
-</para>
-
-<para>
- Once a filter called <replaceable>name</replaceable> has been defined
- in the filter file, it can be invoked by using an action of the form
- +<literal><link linkend="filter">filter</link>{<replaceable>name</replaceable>}</literal>
- in any <link linkend="actions-file">actions file</link>.
-</para>
-
-<para>
- Filter definitions start with a header line that contains the filter
- type, the filter name and the filter description.
- A content filter header line for a filter called <quote>foo</quote> could look
- like this:
-</para>
-
-<para>
- <screen>FILTER: foo Replace all "foo" with "bar"</screen>
-</para>
-
-<para>
- Below that line, and up to the next header line, come the jobs that
- define what text replacements the filter executes. They are specified
- in a syntax that imitates <ulink url="http://www.perl.org/">Perl</ulink>'s
- <literal>s///</literal> operator. If you are familiar with Perl, you
- will find this to be quite intuitive, and may want to look at the
- PCRS documentation for the subtle differences to Perl behaviour. Most
- notably, the non-standard option letter <literal>U</literal> is supported,
- which turns the default to ungreedy matching.
-</para>
-
-<para>
- If you are new to
- <ulink url="http://en.wikipedia.org/wiki/Regular_expressions"><quote>Regular
- Expressions</quote></ulink>, you might want to take a look at
- the <link linkend="regex">Appendix on regular expressions</link>, and
- see the <ulink url="http://perldoc.perl.org/perlre.html">Perl
- manual</ulink> for
- <ulink url="http://perldoc.perl.org/perlop.html">the
- <literal>s///</literal> operator's syntax</ulink> and <ulink
- url="http://perldoc.perl.org/perlre.html">Perl-style regular
- expressions</ulink> in general.
- The below examples might also help to get you started.
-</para>
-
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect2><title>Filter File Tutorial</title>
-<para>
- Now, let's complete our <quote>foo</quote> content filter. We have already defined
- the heading, but the jobs are still missing. Since all it does is to replace
- <quote>foo</quote> with <quote>bar</quote>, there is only one (trivial) job
- needed:
-</para>
-
-<para>
- <screen>s/foo/bar/</screen>
-</para>
-
-<para>
- But wait! Didn't the comment say that <emphasis>all</emphasis> occurrences
- of <quote>foo</quote> should be replaced? Our current job will only take
- care of the first <quote>foo</quote> on each page. For global substitution,
- we'll need to add the <literal>g</literal> option:
-</para>
-
-<para>
- <screen>s/foo/bar/g</screen>
-</para>
-
-<para>
- Our complete filter now looks like this:
-</para>
-<para>
- <screen>FILTER: foo Replace all "foo" with "bar"
-s/foo/bar/g</screen>
-</para>
-
-<para>
- Let's look at some real filters for more interesting examples. Here you see
- a filter that protects against some common annoyances that arise from JavaScript
- abuse. Let's look at its jobs one after the other:
-</para>
-
-
-<para>
- <screen>
-FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
-
-# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
-#
-s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg</screen>
-</para>
-
-<para>
- Following the header line and a comment, you see the job. Note that it uses
- <literal>|</literal> as the delimiter instead of <literal>/</literal>, because
- the pattern contains a forward slash, which would otherwise have to be escaped
- by a backslash (<literal>\</literal>).
-</para>
-
-<para>
- Now, let's examine the pattern: it starts with the text <literal><script.*</literal>
- enclosed in parentheses. Since the dot matches any character, and <literal>*</literal>
- means: <quote>Match an arbitrary number of the element left of myself</quote>, this
- matches <quote><script</quote>, followed by <emphasis>any</emphasis> text, i.e.
- it matches the whole page, from the start of the first <script> tag.
-</para>
-
-<para>
- That's more than we want, but the pattern continues: <literal>document\.referrer</literal>
- matches only the exact string <quote>document.referrer</quote>. The dot needed to
- be <emphasis>escaped</emphasis>, i.e. preceded by a backslash, to take away its
- special meaning as a joker, and make it just a regular dot. So far, the meaning is:
- Match from the start of the first <script> tag in a the page, up to, and including,
- the text <quote>document.referrer</quote>, if <emphasis>both</emphasis> are present
- in the page (and appear in that order).
-</para>
-
-<para>
- But there's still more pattern to go. The next element, again enclosed in parentheses,
- is <literal>.*</script></literal>. You already know what <literal>.*</literal>
- means, so the whole pattern translates to: Match from the start of the first <script>
- tag in a page to the end of the last <script> tag, provided that the text
- <quote>document.referrer</quote> appears somewhere in between.
-</para>
-
-<para>
- This is still not the whole story, since we have ignored the options and the parentheses:
- The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
- remembered and be available through the variables <literal>$1, $2, ...</literal> in
- the substitute. The <literal>U</literal> option switches to ungreedy matching, which means
- that the first <literal>.*</literal> in the pattern will only <quote>eat up</quote> all
- text in between <quote><script</quote> and the <emphasis>first</emphasis> occurrence
- of <quote>document.referrer</quote>, and that the second <literal>.*</literal> will
- only span the text up to the <emphasis>first</emphasis> <quote></script></quote>
- tag. Furthermore, the <literal>s</literal> option says that the match may span
- multiple lines in the page, and the <literal>g</literal> option again means that the
- substitution is global.
-</para>
-
-<para>
- So, to summarize, the pattern means: Match all scripts that contain the text
- <quote>document.referrer</quote>. Remember the parts of the script from
- (and including) the start tag up to (and excluding) the string
- <quote>document.referrer</quote> as <literal>$1</literal>, and the part following
- that string, up to and including the closing tag, as <literal>$2</literal>.
-</para>
-
-<para>
- Now the pattern is deciphered, but wasn't this about substituting things? So
- lets look at the substitute: <literal>$1"Not Your Business!"$2</literal> is
- easy to read: The text remembered as <literal>$1</literal>, followed by
- <literal>"Not Your Business!"</literal> (<emphasis>including</emphasis>
- the quotation marks!), followed by the text remembered as <literal>$2</literal>.
- This produces an exact copy of the original string, with the middle part
- (the <quote>document.referrer</quote>) replaced by <literal>"Not Your
- Business!"</literal>.
-</para>
-
-<para>
- The whole job now reads: Replace <quote>document.referrer</quote> by
- <literal>"Not Your Business!"</literal> wherever it appears inside a
- <script> tag. Note that this job won't break JavaScript syntax,
- since both the original and the replacement are syntactically valid
- string objects. The script just won't have access to the referrer
- information anymore.
-</para>
-
-<para>
- We'll show you two other jobs from the JavaScript taming department, but
- this time only point out the constructs of special interest:
-</para>
-
-<para>
- <screen>
-# The status bar is for displaying link targets, not pointless blahblah
-#
-s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig</screen>
-</para>
-
-<para>
- <literal>\s</literal> stands for whitespace characters (space, tab, newline,
- carriage return, form feed), so that <literal>\s*</literal> means: <quote>zero
- or more whitespace</quote>. The <literal>?</literal> in <literal>.*?</literal>
- makes this matching of arbitrary text ungreedy. (Note that the <literal>U</literal>
- option is not set). The <literal>['"]</literal> construct means: <quote>a single
- <emphasis>or</emphasis> a double quote</quote>. Finally, <literal>\1</literal> is
- a back-reference to the first parenthesis just like <literal>$1</literal> above,
- with the difference that in the <emphasis>pattern</emphasis>, a backslash indicates
- a back-reference, whereas in the <emphasis>substitute</emphasis>, it's the dollar.
-</para>
-
-<para>
- So what does this job do? It replaces assignments of single- or double-quoted
- strings to the <quote>window.status</quote> object with a dummy assignment
- (using a variable name that is hopefully odd enough not to conflict with
- real variables in scripts). Thus, it catches many cases where e.g. pointless
- descriptions are displayed in the status bar instead of the link target when
- you move your mouse over links.
-</para>
-
-<para>
- <screen>
-# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
-#
-s/(<body [^>]*)onunload(.*>)/$1never$2/iU</screen>
-</para>
-
-<para>
- Including the
- <ulink url="http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents">OnUnload
- event binding</ulink> in the HTML DOM was a <emphasis>CRIME</emphasis>.
- When I close a browser window, I want it to close and die. Basta.
- This job replaces the <quote>onunload</quote> attribute in
- <quote><body></quote> tags with the dummy word <literal>never</literal>.
- Note that the <literal>i</literal> option makes the pattern matching
- case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
- a minimal match: In the first parenthesis, we had to use <literal>[^>]*</literal>
- instead of <literal>.*</literal> to prevent the match from exceeding the
- <body> tag if it doesn't contain <quote>OnUnload</quote>, but the page's
- content does.
-</para>
-
-<para>
- The last example is from the fun department:
-</para>
-
-<para>
- <screen>
-FILTER: fun Fun text replacements
-
-# Spice the daily news:
-#
-s/microsoft(?!\.com)/MicroSuck/ig</screen>
-</para>
-
-<para>
- Note the <literal>(?!\.com)</literal> part (a so-called negative lookahead)
- in the job's pattern, which means: Don't match, if the string
- <quote>.com</quote> appears directly following <quote>microsoft</quote>
- in the page. This prevents links to microsoft.com from being trashed, while
- still replacing the word everywhere else.
-</para>
-
-<para>
- <screen>
-# Buzzword Bingo (example for extended regex syntax)
-#
-s* industry[ -]leading \
-| cutting[ -]edge \
-| customer[ -]focused \
-| market[ -]driven \
-| award[ -]winning # Comments are OK, too! \
-| high[ -]performance \
-| solutions[ -]based \
-| unmatched \
-| unparalleled \
-| unrivalled \
-*<font color="red"><b>BINGO!</b></font> \
-*igx</screen>
-</para>
-
-<para>
- The <literal>x</literal> option in this job turns on extended syntax, and allows for
- e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
-</para>
-
-<para>
- You get the idea?
-</para>
-</sect2>
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect2 id="predefined-filters"><title>The Pre-defined Filters</title>
-
-<!--
-
- Note each filter is also listed in the +filter action section above. Please
- keep these listings in sync.
-
--->
-
-<para>
-The distribution <filename>default.filter</filename> file contains a selection of
-pre-defined filters for your convenience:
-</para>
-
-<variablelist>
- <varlistentry>
- <term><emphasis>js-annoyances</emphasis></term>
- <listitem>
- <para>
- The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
- To that end, it
- <itemizedlist>
- <listitem>
- <para>
- replaces JavaScript references to the browser's referrer information
- with the string "Not Your Business!". This compliments the <literal><link
- linkend="hide-referrer">hide-referrer</link></literal> action on the content level.
- </para>
- </listitem>
- <listitem>
- <para>
- removes the bindings to the DOM's
- <ulink url="http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents">unload
- event</ulink> which we feel has no right to exist and is responsible for most <quote>exit consoles</quote>, i.e.
- nasty windows that pop up when you close another one.
- </para>
- </listitem>
- <listitem>
- <para>
- removes code that causes new windows to be opened with undesired properties, such as being
- full-screen, non-resizeable, without location, status or menu bar etc.
- </para>
- </listitem>
- </itemizedlist>
- </para>
- <para>
- Use with caution. This is an aggressive filter, and can break sites that
- rely heavily on JavaScript.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>js-events</emphasis></term>
- <listitem>
- <para>
- This is a very radical measure. It removes virtually all JavaScript event bindings, which
- means that scripts can not react to user actions such as mouse movements or clicks, window
- resizing etc, anymore. Use with caution!
- </para>
- <para>
- We <emphasis>strongly discourage</emphasis> using this filter as a default since it breaks
- many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
- need to go there).
- </para>
- </listitem>
- </varlistentry>
-
-<varlistentry>
- <term><emphasis>html-annoyances</emphasis></term>
- <listitem>
- <para>
- This filter will undo many common instances of HTML based abuse.
- </para>
- <para>
- The <literal>BLINK</literal> and <literal>MARQUEE</literal> tags
- are neutralized (yeah baby!), and browser windows will be created as
- resizeable (as of course they should be!), and will have location,
- scroll and menu bars -- even if specified otherwise.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>content-cookies</emphasis></term>
- <listitem>
- <para>
- Most cookies are set in the HTTP dialog, where they can be intercepted
- by the
- <literal><link linkend="crunch-incoming-cookies">crunch-incoming-cookies</link></literal>
- and <literal><link linkend="crunch-outgoing-cookies">crunch-outgoing-cookies</link></literal>
- actions. But web sites increasingly make use of HTML meta tags and JavaScript
- to sneak cookies to the browser on the content level.
- </para>
- <para>
- This filter disables most HTML and JavaScript code that reads or sets
- cookies. It cannot detect all clever uses of these types of code, so it
- should not be relied on as an absolute fix. Use it wherever you would also
- use the cookie crunch actions.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>refresh tags</emphasis></term>
- <listitem>
- <para>
- Disable any refresh tags if the interval is greater than nine seconds (so
- that redirections done via refresh tags are not destroyed). This is useful
- for dial-on-demand setups, or for those who find this HTML feature
- annoying.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>unsolicited-popups</emphasis></term>
- <listitem>
- <para>
- This filter attempts to prevent only <quote>unsolicited</quote> pop-up
- windows from opening, yet still allow pop-up windows that the user
- has explicitly chosen to open. It was added in version 3.0.1,
- as an improvement over earlier such filters.
- </para>
- <para>
- Technical note: The filter works by redefining the window.open JavaScript
- function to a dummy function, <literal>PrivoxyWindowOpen()</literal>,
- during the loading and rendering phase of each HTML page access, and
- restoring the function afterward.
- </para>
- <para>
- This is recommended only for browsers that cannot perform this function
- reliably themselves. And be aware that some sites require such windows
- in order to function normally. Use with caution.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>all-popups</emphasis></term>
- <listitem>
- <para>
- Attempt to prevent <emphasis>all</emphasis> pop-up windows from opening.
- Note this should be used with even more discretion than the above, since
- it is more likely to break some sites that require pop-ups for normal
- usage. Use with caution.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>img-reorder</emphasis></term>
- <listitem>
- <para>
- This is a helper filter that has no value if used alone. It makes the
- <literal>banners-by-size</literal> and <literal>banners-by-link</literal>
- (see below) filters more effective and should be enabled together with them.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>banners-by-size</emphasis></term>
- <listitem>
- <para>
- This filter removes image tags purely based on what size they are. Fortunately
- for us, many ads and banner images tend to conform to certain standardized
- sizes, which makes this filter quite effective for ad stripping purposes.
- </para>
- <para>
- Occasionally this filter will cause false positives on images that are not ads,
- but just happen to be of one of the standard banner sizes.
- </para>
- <para>
- Recommended only for those who require extreme ad blocking. The default
- block rules should catch 95+% of all ads <emphasis>without</emphasis> this filter enabled.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>banners-by-link</emphasis></term>
- <listitem>
- <para>
- This is an experimental filter that attempts to kill any banners if
- their URLs seem to point to known or suspected click trackers. It is currently
- not of much value and is not recommended for use by default.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>webbugs</emphasis></term>
- <listitem>
- <para>
- Webbugs are small, invisible images (technically 1X1 GIF images), that
- are used to track users across websites, and collect information on them.
- As an HTML page is loaded by the browser, an embedded image tag causes the
- browser to contact a third-party site, disclosing the tracking information
- through the requested URL and/or cookies for that third-party domain, without
- the user ever becoming aware of the interaction with the third-party site.
- HTML-ized spam also uses a similar technique to verify email addresses.
- </para>
- <para>
- This filter removes the HTML code that loads such <quote>webbugs</quote>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>tiny-textforms</emphasis></term>
- <listitem>
- <para>
- A rather special-purpose filter that can be used to enlarge textareas (those
- multi-line text boxes in web forms) and turn off hard word wrap in them.
- It was written for the sourceforge.net tracker system where such boxes are
- a nuisance, but it can be handy on other sites, too.
- </para>
- <para>
- It is not recommended to use this filter as a default.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>jumping-windows</emphasis></term>
- <listitem>
- <para>
- Many consider windows that move, or resize themselves to be abusive. This filter
- neutralizes the related JavaScript code. Note that some sites might not display
- or behave as intended when using this filter. Use with caution.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>frameset-borders</emphasis></term>
- <listitem>
- <para>
- Some web designers seem to assume that everyone in the world will view their
- web sites using the same browser brand and version, screen resolution etc,
- because only that assumption could explain why they'd use static frame sizes,
- yet prevent their frames from being resized by the user, should they be too
- small to show their whole content.
- </para>
- <para>
- This filter removes the related HTML code. It should only be applied to sites
- which need it.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>demoronizer</emphasis></term>
- <listitem>
- <para>
- Many Microsoft products that generate HTML use non-standard extensions (read:
- violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
- HTML documents to display with errors on standard-compliant platforms.
- </para>
- <para>
- This filter translates the MS-only characters into Latin-1 equivalents.
- It is not necessary when using MS products, and will cause corruption of
- all documents that use 8-bit character sets other than Latin-1. It's mostly
- worthwhile for Europeans on non-MS platforms, if weird garbage characters
- sometimes appear on some pages, or user agents that don't correct for this on
- the fly.
-<!--
- My version of Mozilla (ancient) shows litte square boxes for quote
- characters, and apostrophes on moronized pages. So many pages have this, I
- can read them fine now. HB 08/27/06
--->
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>shockwave-flash</emphasis></term>
- <listitem>
- <para>
- A filter for shockwave haters. As the name suggests, this filter strips code
- out of web pages that is used to embed shockwave flash objects.
- </para>
- <para>
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>quicktime-kioskmode</emphasis></term>
- <listitem>
- <para>
- Change HTML code that embeds Quicktime objects so that kioskmode, which
- prevents saving, is disabled.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>fun</emphasis></term>
- <listitem>
- <para>
- Text replacements for subversive browsing fun. Make fun of your favorite
- Monopolist or play buzzword bingo.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>crude-parental</emphasis></term>
- <listitem>
- <para>
- A demonstration-only filter that shows how <application>Privoxy</application>
- can be used to delete web content on a keyword basis.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>ie-exploits</emphasis></term>
- <listitem>
- <para>
- An experimental collection of text replacements to disable malicious HTML and JavaScript
- code that exploits known security holes in Internet Explorer.
- </para>
- <para>
- Presently, it only protects against Nimda and a cross-site scripting bug, and
- would need active maintenance to provide more substantial protection.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>site-specifics</emphasis></term>
- <listitem>
- <para>
- Some web sites have very specific problems, the cure for which doesn't apply
- anywhere else, or could even cause damage on other sites.
- </para>
- <para>
- This is a collection of such site-specific cures which should only be applied
- to the sites they were intended for, which is what the supplied
- <filename>default.action</filename> file does. Users shouldn't need to change
- anything regarding this filter.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>google</emphasis></term>
- <listitem>
- <para>
- A CSS based block for Google text ads. Also removes a width limitation
- and the toolbar advertisement.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>yahoo</emphasis></term>
- <listitem>
- <para>
- Another CSS based block, this time for Yahoo text ads. And removes
- a width limitation as well.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>msn</emphasis></term>
- <listitem>
- <para>
- Another CSS based block, this time for MSN text ads. And removes
- tracking URLs, as well as a width limitation.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>blogspot</emphasis></term>
- <listitem>
- <para>
- Cleans up some Blogspot blogs. Read the fine print before using this one!
- </para>
- <para>
- This filter also intentionally removes some navigation stuff and sets the
- page width to 100%. As a result, some rounded <quote>corners</quote> would
- appear to early or not at all and as fixing this would require a browser
- that understands background-size (CSS3), they are removed instead.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>xml-to-html</emphasis></term>
- <listitem>
- <para>
- Server-header filter to change the Content-Type from xml to html.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>html-to-xml</emphasis></term>
- <listitem>
- <para>
- Server-header filter to change the Content-Type from html to xml.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>no-ping</emphasis></term>
- <listitem>
- <para>
- Removes the non-standard <literal>ping</literal> attribute from
- anchor and area HTML tags.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><emphasis>hide-tor-exit-notation</emphasis></term>
- <listitem>
- <para>
- Client-header filter to remove the <command>Tor</command> exit node notation
- found in Host and Referer headers.
- </para>
- <para>
- If &my-app; and <command>Tor</command> are chained and &my-app;
- is configured to use socks4a, one can use <quote>http://www.example.org.foobar.exit/</quote>
- to access the host <quote>www.example.org</quote> through the
- <command>Tor</command> exit node <quote>foobar</quote>.
- </para>
- <para>
- As the HTTP client isn't aware of this notation, it treats the
- whole string <quote>www.example.org.foobar.exit</quote> as host and uses it
- for the <quote>Host</quote> and <quote>Referer</quote> headers. From the
- server's point of view the resulting headers are invalid and can cause problems.
- </para>
- <para>
- An invalid <quote>Referer</quote> header can trigger <quote>hot-linking</quote>
- protections, an invalid <quote>Host</quote> header will make it impossible for
- the server to find the right vhost (several domains hosted on the same IP address).
- </para>
- <para>
- This client-header filter removes the <quote>foo.exit</quote> part in those headers
- to prevent the mentioned problems. Note that it only modifies
- the HTTP headers, it doesn't make it impossible for the server
- to detect your <command>Tor</command> exit node based on the IP address
- the request is coming from.
- </para>
- </listitem>
- </varlistentry>
-
-<!--
- <varlistentry>
- <term><emphasis> </emphasis></term>
- <listitem>
- <para>
- </para>
- <para>
- </para>
- </listitem>
- </varlistentry>
--->
-</variablelist>
-
-</sect2>
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect1 id="templates">
-<title>Privoxy's Template Files</title>
-<para>
- All <application>Privoxy</application> built-in pages, i.e. error pages such as the
- <ulink url="http://show-the-404-error.page"><quote>404 - No Such Domain</quote>
- error page</ulink>, the <ulink
- url="http://ads.bannerserver.example.com/nasty-ads/sponsor.html"><quote>BLOCKED</quote>
- page</ulink>
- and all pages of its <ulink url="http://config.privoxy.org/">web-based
- user interface</ulink>, are generated from <emphasis>templates</emphasis>.
- (<application>Privoxy</application> must be running for the above links to work as
- intended.)
-</para>
-
-<para>
- These templates are stored in a subdirectory of the <link linkend="confdir">configuration
- directory</link> called <filename>templates</filename>. On Unixish platforms,
- this is typically
- <ulink url="file:///etc/privoxy/templates/"><filename>/etc/privoxy/templates/</filename></ulink>.
-</para>
-
-<para>
- The templates are basically normal HTML files, but with place-holders (called symbols
- or exports), which <application>Privoxy</application> fills at run time. It
- is possible to edit the templates with a normal text editor, should you want
- to customize them. (<emphasis>Not recommended for the casual
- user</emphasis>). Should you create your own custom templates, you should use
- the <filename>config</filename> setting <link linkend="templdir">templdir</link>
- to specify an alternate location, so your templates do not get overwritten
- during upgrades.
- </para>
- <para>
- Note that just like in configuration files, lines starting
- with <literal>#</literal> are ignored when the templates are filled in.
-</para>
-
-<para>
- The place-holders are of the form <literal>@name@</literal>, and you will
- find a list of available symbols, which vary from template to template,
- in the comments at the start of each file. Note that these comments are not
- always accurate, and that it's probably best to look at the existing HTML
- code to find out which symbols are supported and what they are filled in with.
-</para>
-
-<para>
- A special application of this substitution mechanism is to make whole
- blocks of HTML code disappear when a specific symbol is set. We use this
- for many purposes, one of them being to include the beta warning in all
- our user interface (CGI) pages when <application>Privoxy</application>
- is in an alpha or beta development stage:
-</para>
-
-<para>
- <screen>
-<!-- @if-unstable-start -->
-
- ... beta warning HTML code goes here ...
-
-<!-- if-unstable-end@ --></screen>
-</para>
-
-<para>
- If the "unstable" symbol is set, everything in between and including
- <literal>@if-unstable-start</literal> and <literal>if-unstable-end@</literal>
- will disappear, leaving nothing but an empty comment:
-</para>
-
-<para>
- <screen><!-- --></screen>
-</para>
-
-<para>
- There's also an if-then-else construct and an <literal>#include</literal>
- mechanism, but you'll sure find out if you are inclined to edit the
- templates ;-)
-</para>
-
-<para>
- All templates refer to a style located at
- <ulink url="http://config.privoxy.org/send-stylesheet"><literal>http://config.privoxy.org/send-stylesheet</literal></ulink>.
- This is, of course, locally served by <application>Privoxy</application>
- and the source for it can be found and edited in the
- <filename>cgi-style.css</filename> template.
-</para>
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect1 id="contact"><title>Contacting the Developers, Bug Reporting and Feature
-Requests</title>
-
-<!-- Include contacting.sgml boilerplate: -->
- &contacting;
-<!-- end boilerplate -->
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="copyright"><title>Privoxy Copyright, License and History</title>
-
-<!-- Include copyright.sgml: -->
- ©right;
-<!-- end copyright -->
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2><title>License</title>
-<!-- Include copyright.sgml: -->
- &license;
-<!-- end copyright -->
-</sect2>
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-
-<sect2 id="history"><title>History</title>
-<!-- Include history.sgml: -->
- &history;
-<!-- end history -->
-</sect2>
-
-<sect2 id="authors"><title>Authors</title>
-<!-- Include p-authors.sgml: -->
- &p-authors;
-<!-- end authors -->
-</sect2>
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="seealso"><title>See Also</title>
-<!-- Include seealso.sgml: -->
- &seealso;
-<!-- end seealso -->
-</sect1>
-
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect1 id="appendix"><title>Appendix</title>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="regex">
-<title>Regular Expressions</title>
-<para>
- <application>Privoxy</application> uses Perl-style <quote>regular
- expressions</quote> in its <link linkend="actions-file">actions
- files</link> and <link linkend="filter-file">filter file</link>,
- through the <ulink url="http://www.pcre.org/">PCRE</ulink> and
-<!--
- dead 08/27/06
- <ulink url="http://www.oesterhelt.org/pcrs/">PCRS</ulink> libraries.
--->
- <application>PCRS</application> libraries.
-</para>
-
-<para>
- If you are reading this, you probably don't understand what <quote>regular
- expressions</quote> are, or what they can do. So this will be a very brief
- introduction only. A full explanation would require a <ulink
- url="http://www.oreilly.com/catalog/regex/">book</ulink> ;-)
-</para>
-
-<para>
- Regular expressions provide a language to describe patterns that can be
- run against strings of characters (letter, numbers, etc), to see if they
- match the string or not. The patterns are themselves (sometimes complex)
- strings of literal characters, combined with wild-cards, and other special
- characters, called meta-characters. The <quote>meta-characters</quote> have
- special meanings and are used to build complex patterns to be matched against.
- Perl Compatible Regular Expressions are an especially convenient
- <quote>dialect</quote> of the regular expression language.
-</para>
-
-<para>
- To make a simple analogy, we do something similar when we use wild-card
- characters when listing files with the <command>dir</command> command in DOS.
- <literal>*.*</literal> matches all filenames. The <quote>special</quote>
- character here is the asterisk which matches any and all characters. We can be
- more specific and use <literal>?</literal> to match just individual
- characters. So <quote>dir file?.text</quote> would match
- <quote>file1.txt</quote>, <quote>file2.txt</quote>, etc. We are pattern
- matching, using a similar technique to <quote>regular expressions</quote>!
-</para>
-
-<para>
- Regular expressions do essentially the same thing, but are much, much more
- powerful. There are many more <quote>special characters</quote> and ways of
- building complex patterns however. Let's look at a few of the common ones,
- and then some examples:
-</para>
-
-<para><simplelist>
- <member>
- <emphasis>.</emphasis> - Matches any single character, e.g. <quote>a</quote>,
- <quote>A</quote>, <quote>4</quote>, <quote>:</quote>, or <quote>@</quote>.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>?</emphasis> - The preceding character or expression is matched ZERO or ONE
- times. Either/or.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>+</emphasis> - The preceding character or expression is matched ONE or MORE
- times.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>*</emphasis> - The preceding character or expression is matched ZERO or MORE
- times.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>\</emphasis> - The <quote>escape</quote> character denotes that
- the following character should be taken literally. This is used where one of the
- special characters (e.g. <quote>.</quote>) needs to be taken literally and
- not as a special meta-character. Example: <quote>example\.com</quote>, makes
- sure the period is recognized only as a period (and not expanded to its
- meta-character meaning of any single character).
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>[ ]</emphasis> - Characters enclosed in brackets will be matched if
- any of the enclosed characters are encountered. For instance, <quote>[0-9]</quote>
- matches any numeric digit (zero through nine). As an example, we can combine
- this with <quote>+</quote> to match any digit one of more times: <quote>[0-9]+</quote>.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>( )</emphasis> - parentheses are used to group a sub-expression,
- or multiple sub-expressions.
- </member>
-</simplelist></para>
-
-<para><simplelist>
- <member>
- <emphasis>|</emphasis> - The <quote>bar</quote> character works like an
- <quote>or</quote> conditional statement. A match is successful if the
- sub-expression on either side of <quote>|</quote> matches. As an example:
- <quote>/(this|that) example/</quote> uses grouping and the bar character
- and would match either <quote>this example</quote> or <quote>that
- example</quote>, and nothing else.
- </member>
-</simplelist></para>
-
-<para>
- These are just some of the ones you are likely to use when matching URLs with
- <application>Privoxy</application>, and is a long way from a definitive
- list. This is enough to get us started with a few simple examples which may
- be more illuminating:
-</para>
-
-<para>
- <emphasis><literal>/.*/banners/.*</literal></emphasis> - A simple example
- that uses the common combination of <quote>.</quote> and <quote>*</quote> to
- denote any character, zero or more times. In other words, any string at all.
- So we start with a literal forward slash, then our regular expression pattern
- (<quote>.*</quote>) another literal forward slash, the string
- <quote>banners</quote>, another forward slash, and lastly another
- <quote>.*</quote>. We are building
- a directory path here. This will match any file with the path that has a
- directory named <quote>banners</quote> in it. The <quote>.*</quote> matches
- any characters, and this could conceivably be more forward slashes, so it
- might expand into a much longer looking path. For example, this could match:
- <quote>/eye/hate/spammers/banners/annoy_me_please.gif</quote>, or just
- <quote>/banners/annoying.html</quote>, or almost an infinite number of other
- possible combinations, just so it has <quote>banners</quote> in the path
- somewhere.
-</para>
-
-<para>
- And now something a little more complex:
-</para>
-
-<para>
- <emphasis><literal>/.*/adv((er)?ts?|ertis(ing|ements?))?/</literal></emphasis> -
- We have several literal forward slashes again (<quote>/</quote>), so we are
- building another expression that is a file path statement. We have another
- <quote>.*</quote>, so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that <emphasis>must
- match</emphasis> our pattern is <application>adv</application>, together with
- the forward slashes. What comes after the <quote>adv</quote> string is the
- interesting part.
-</para>
-
-<para>
- Remember the <quote>?</quote> means the preceding expression (either a
- literal character or anything grouped with <quote>(...)</quote> in this case)
- can exist or not, since this means either zero or one match. So
- <quote>((er)?ts?|ertis(ing|ements?))</quote> is optional, as are the
- individual sub-expressions: <quote>(er)</quote>,
- <quote>(ing|ements?)</quote>, and the <quote>s</quote>. The <quote>|</quote>
- means <quote>or</quote>. We have two of those. For instance,
- <quote>(ing|ements?)</quote>, can expand to match either <quote>ing</quote>
- <emphasis>OR</emphasis> <quote>ements?</quote>. What is being done here, is an
- attempt at matching as many variations of <quote>advertisement</quote>, and
- similar, as possible. So this would expand to match just <quote>adv</quote>,
- or <quote>advert</quote>, or <quote>adverts</quote>, or
- <quote>advertising</quote>, or <quote>advertisement</quote>, or
- <quote>advertisements</quote>. You get the idea. But it would not match
- <quote>advertizements</quote> (with a <quote>z</quote>). We could fix that by
- changing our regular expression to:
- <quote>/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/</quote>, which would then match
- either spelling.
-</para>
-
-<para>
- <emphasis><literal>/.*/advert[0-9]+\.(gif|jpe?g)</literal></emphasis> - Again
- another path statement with forward slashes. Anything in the square brackets
- <quote>[ ]</quote> can be matched. This is using <quote>0-9</quote> as a
- shorthand expression to mean any digit one through nine. It is the same as
- saying <quote>0123456789</quote>. So any digit matches. The <quote>+</quote>
- means one or more of the preceding expression must be included. The preceding
- expression here is what is in the square brackets -- in this case, any digit
- one through nine. Then, at the end, we have a grouping: <quote>(gif|jpe?g)</quote>.
- This includes a <quote>|</quote>, so this needs to match the expression on
- either side of that bar character also. A simple <quote>gif</quote> on one side, and the other
- side will in turn match either <quote>jpeg</quote> or <quote>jpg</quote>,
- since the <quote>?</quote> means the letter <quote>e</quote> is optional and
- can be matched once or not at all. So we are building an expression here to
- match image GIF or JPEG type image file. It must include the literal
- string <quote>advert</quote>, then one or more digits, and a <quote>.</quote>
- (which is now a literal, and not a special character, since it is escaped
- with <quote>\</quote>), and lastly either <quote>gif</quote>, or
- <quote>jpeg</quote>, or <quote>jpg</quote>. Some possible matches would
- include: <quote>//advert1.jpg</quote>,
- <quote>/nasty/ads/advert1234.gif</quote>,
- <quote>/banners/from/hell/advert99.jpg</quote>. It would not match
- <quote>advert1.gif</quote> (no leading slash), or
- <quote>/adverts232.jpg</quote> (the expression does not include an
- <quote>s</quote>), or <quote>/advert1.jsp</quote> (<quote>jsp</quote> is not
- in the expression anywhere).
-</para>
-
-<para>
- We are barely scratching the surface of regular expressions here so that you
- can understand the default <application>Privoxy</application>
- configuration files, and maybe use this knowledge to customize your own
- installation. There is much, much more that can be done with regular
- expressions. Now that you know enough to get started, you can learn more on
- your own :/
-</para>
-
-<para>
- More reading on Perl Compatible Regular expressions:
- <ulink url="http://perldoc.perl.org/perlre.html">http://perldoc.perl.org/perlre.html</ulink>
-</para>
-
-<para>
- For information on regular expression based substitutions and their applications
- in filters, please see the <link linkend="filter-file">filter file tutorial</link>
- in this manual.
-</para>
-</sect2>
-
-<!-- ~ End section ~ -->
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2>
-<title>Privoxy's Internal Pages</title>
-
-<para>
- Since <application>Privoxy</application> proxies each requested
- web page, it is easy for <application>Privoxy</application> to
- trap certain special URLs. In this way, we can talk directly to
- <application>Privoxy</application>, and see how it is
- configured, see how our rules are being applied, change these
- rules and other configuration options, and even turn
- <application>Privoxy's</application> filtering off, all with
- a web browser.
-
-</para>
-
-<para>
- The URLs listed below are the special ones that allow direct access
- to <application>Privoxy</application>. Of course,
- <application>Privoxy</application> must be running to access these. If
- not, you will get a friendly error message. Internet access is not
- necessary either.
-</para>
-
-<para>
- <itemizedlist>
-
- <listitem>
- <para>
- Privoxy main page:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/">http://config.privoxy.org/</ulink>
- </para>
- </blockquote>
- <para>
- There is a shortcut: <ulink url="http://p.p/">http://p.p/</ulink> (But it
- doesn't provide a fall-back to a real page, in case the request is not
- sent through <application>Privoxy</application>)
- </para>
- </listitem>
-
- <listitem>
- <para>
- Show information about the current configuration, including viewing and
- editing of actions files:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-status">http://config.privoxy.org/show-status</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show the source code version numbers:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-version">http://config.privoxy.org/show-version</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show the browser's request headers:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-request">http://config.privoxy.org/show-request</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Show which actions apply to a URL and why:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/show-url-info">http://config.privoxy.org/show-url-info</ulink>
- </para>
- </blockquote>
- </listitem>
-
- <listitem>
- <para>
- Toggle Privoxy on or off. This feature can be turned off/on in the main
- <filename>config</filename> file. When toggled <quote>off</quote>, <quote>Privoxy</quote>
- continues to run, but only as a pass-through proxy, with no actions taking
- place:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle">http://config.privoxy.org/toggle</ulink>
- </para>
- </blockquote>
- <para>
- Short cuts. Turn off, then on:
- </para>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle?set=disable">http://config.privoxy.org/toggle?set=disable</ulink>
- </para>
- </blockquote>
- <blockquote>
- <para>
- <ulink url="http://config.privoxy.org/toggle?set=enable">http://config.privoxy.org/toggle?set=enable</ulink>
- </para>
- </blockquote>
- </listitem>
-
- </itemizedlist>