+ Any web page can be dynamically modified with the filter file. This
+ modification can be removal, or re-writing, of any web page content,
+ including tags and non-visible content. The default filter file is
+ <filename>default.filter</filename>, located in the config directory.
+</para>
+
+<para>
+ This is potentially a very powerful feature, and requires knowledge of both
+ <quote>regular expression</quote> and HTML in order create custom
+ filters. But, there are a number of useful filters included with
+ <application>Privoxy</application> for many common situations.
+</para>
+
+<para>
+ The included example file is divided into sections. Each section begins
+ with the <literal>FILTER</literal> keyword, followed by the identifier
+ for that section, e.g. <quote>FILTER: webbugs</quote>. Each section performs
+ a similar type of filtering, such as <quote>html-annoyances</quote>.
+</para>
+
+<para>
+ This file uses regular expressions to alter or remove any string in the
+ target page. The expressions can only operate on one line at a time. Some
+ examples from the included default <filename>default.filter</filename>:
+</para>
+
+<para>
+ Stop web pages from displaying annoying messages in the status bar by
+ deleting such references:
+</para>
+
+<para>
+ <literal>
+ <msgtext>
+ <literallayout>
+ FILTER: html-annoyances
+
+ # New browser windows should be resizeable and have a location and status
+ # bar. Make it so.
+ #
+ s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig
+ s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig
+ s/scrolling="?(no|0|Auto)"?/scrolling=1/ig
+ s/menubar="?(no|0)"?/menubar=1/ig
+
+ # The <BLINK> tag was a crime!
+ #
+ s*<blink>|</blink>**ig
+
+ # Is this evil?
+ #
+ #s/framespacing="?(no|0)"?//ig
+ #s/margin(height|width)=[0-9]*//gi
+ </literallayout>
+ </msgtext>
+ </literal>
+</para>
+
+<para>
+ Just for kicks, replace any occurrence of <quote>Microsoft</quote> with
+ <quote>MicroSuck</quote>, and have a little fun with topical buzzwords:
+</para>
+
+<para>
+ <literal>
+ <msgtext>
+ <literallayout>
+ FILTER: fun
+
+ s/microsoft(?!.com)/MicroSuck/ig
+
+ # Buzzword Bingo:
+ #
+ s/industry-leading|cutting-edge|award-winning/<font color=red><b>BINGO!</b></font>/ig
+ </literallayout>
+ </msgtext>
+ </literal>
+</para>
+
+<para>
+ Kill those pesky little web-bugs:
+</para>
+
+<para>
+ <literal>
+ <msgtext>
+ <literallayout>
+ # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking)
+ FILTER: webbugs
+
+ s/<img\s+[^>]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\s*['"]?1(\D[^>]*?)?>/<!-- Squished WebBug -->/sig
+ </literallayout>
+ </msgtext>
+ </literal>
+</para>
+
+</sect2>
+
+<!-- ~ End section ~ -->
+
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+
+<sect2>
+<title>Templates</title>
+<para>
+ When <application>Privoxy</application> displays one of its internal
+ pages, such as a 404 Not Found error page, it uses the appropriate template.
+ On Linux, BSD, and Unix, these are located in
+ <filename>/etc/privoxy/templates</filename> by default. These may be
+ customized, if desired. <filename>cgi-style.css</filename> is
+ used to control the HTML attributes (fonts, etc).
+</para>
+<para>
+ The default <quote>Blocked</quote> banner page with the bright red top
+ banner, is called just <quote><filename>blocked</filename></quote>. This
+ may be customized or replaced with something else if desired.
+
+</para>
+</sect2>
+
+</sect1>
+
+<!-- ~ End section ~ -->
+
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+
+<sect1 id="contact"><title>Contacting the Developers, Bug Reporting and Feature
+Requests</title>
+
+<!-- Include contacting.sgml boilerplate: -->
+ &contacting;
+<!-- end boilerplate -->
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect2 id="submitactions">
+<title>Submitting Ads and <quote>Action</quote> Problems</title>
+<para>
+ Ads and banners that are not stopped by <application>Privoxy</application>
+ can be submitted to the developers by accessing a special page and filling
+ out the brief, required form. Conversely, you can also report pages, images,
+ etc. that <application>Privoxy</application> is blocking, but should not.
+ The form itself does require Internet access.
+</para>
+<para>
+ To do this, point your browser to <application>Privoxy</application>
+ at <ulink url="http://config.privoxy.org/">http://config.privoxy.org/</ulink>
+ (shortcut: <ulink url="http://p.p/">http://p.p/</ulink>), and then select
+ <ulink url="javascript:w=Math.floor(screen.width/2);h=Math.floor(screen.height*0.9);void(window.open('http://www.privoxy.org/actions','Feedback','screenx='+w+',width='+w+',height='+h+',scrollbars=yes,toolbar=no,location=no,directories=no,status=no,menubar=no,copyhistory=no').focus());">Actions file feedback system</ulink>,
+ near the bottom of the page. Paste in the URL that is the cause of the
+ unwanted behavior, and follow the prompts. The developers will
+ try to incorporate a fix for the problem you reported into future versions.
+</para>
+
+<para>
+ New <filename>default.actions</filename> files will occasionally be made
+ available based on your feedback. These
+ will be announced on the
+ <ulink
+ url="http://lists.sourceforge.net/lists/listinfo/ijbswa-announce">ijbswa-announce</ulink>
+ list.
+</para>
+</sect2>
+
+</sect1>
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect1 id="copyright"><title>Copyright and History</title>
+
+<sect2><title>Copyright</title>
+<!-- Include copyright.sgml: -->
+ ©right;
+<!-- end copyright -->
+</sect2>
+
+<!-- ~ End section ~ -->
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+
+<sect2 id="history"><title>History</title>
+<!-- Include history.sgml: -->
+ &history;
+<!-- end history -->
+</sect2>
+</sect1>
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect1 id="seealso"><title>See Also</title>
+<!-- Include seealso.sgml: -->
+ &seealso;
+<!-- end seealso -->
+</sect1>
+
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect1 id="appendix"><title>Appendix</title>
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect2 id="regex">
+<title>Regular Expressions</title>
+<para>
+ <application>Privoxy</application> can use <quote>regular expressions</quote>
+ in various config files. Assuming support for <quote>pcre</quote> (Perl
+ Compatible Regular Expressions) is compiled in, which is the default. Such
+ configuration directives do not require regular expressions, but they can be
+ used to increase flexibility by matching a pattern with wild-cards against
+ URLs.
+</para>
+
+<para>
+ If you are reading this, you probably don't understand what <quote>regular
+ expressions</quote> are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+</para>
+
+<para>
+ <quote>Regular expressions</quote> is a way of matching one character
+ expression against another to see if it matches or not. One of the
+ <quote>expressions</quote> is a literal string of readable characters
+ (letter, numbers, etc), and the other is a complex string of literal
+ characters combined with wild-cards, and other special characters, called
+ meta-characters. The <quote>meta-characters</quote> have special meanings and
+ are used to build the complex pattern to be matched against. Perl Compatible
+ Regular Expressions is an enhanced form of the regular expression language
+ with backward compatibility.
+</para>
+
+<para>
+ To make a simple analogy, we do something similar when we use wild-card
+ characters when listing files with the <command>dir</command> command in DOS.
+ <literal>*.*</literal> matches all filenames. The <quote>special</quote>
+ character here is the asterisk which matches any and all characters. We can be
+ more specific and use <literal>?</literal> to match just individual
+ characters. So <quote>dir file?.text</quote> would match
+ <quote>file1.txt</quote>, <quote>file2.txt</quote>, etc. We are pattern
+ matching, using a similar technique to <quote>regular expressions</quote>!
+</para>
+
+<para>
+ Regular expressions do essentially the same thing, but are much, much more
+ powerful. There are many more <quote>special characters</quote> and ways of
+ building complex patterns however. Let's look at a few of the common ones,
+ and then some examples:
+</para>
+
+<para><simplelist>
+ <member>
+ <emphasis>.</emphasis> - Matches any single character, e.g. <quote>a</quote>,
+ <quote>A</quote>, <quote>4</quote>, <quote>:</quote>, or <quote>@</quote>.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>?</emphasis> - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>+</emphasis> - The preceding character or expression is matched ONE or MORE
+ times.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>*</emphasis> - The preceding character or expression is matched ZERO or MORE
+ times.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>\</emphasis> - The <quote>escape</quote> character denotes that
+ the following character should be taken literally. This is used where one of the
+ special characters (e.g. <quote>.</quote>) needs to be taken literally and
+ not as a special meta-character.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>[]</emphasis> - Characters enclosed in brackets will be matched if
+ any of the enclosed characters are encountered.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>()</emphasis> - parentheses are used to group a sub-expression,
+ or multiple sub-expressions.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>|</emphasis> - The <quote>bar</quote> character works like an
+ <quote>or</quote> conditional statement. A match is successful if the
+ sub-expression on either side of <quote>|</quote> matches.
+ </member>
+</simplelist></para>
+
+<para><simplelist>
+ <member>
+ <emphasis>s/string1/string2/g</emphasis> - This is used to rewrite strings of text.
+ <quote>string1</quote> is replaced by <quote>string2</quote> in this
+ example.
+ </member>
+</simplelist></para>
+
+<para>
+ These are just some of the ones you are likely to use when matching URLs with
+ <application>Privoxy</application>, and is a long way from a definitive
+ list. This is enough to get us started with a few simple examples which may
+ be more illuminating:
+</para>
+
+<para>
+ <emphasis><literal>/.*/banners/.*</literal></emphasis> - A simple example
+ that uses the common combination of <quote>.</quote> and <quote>*</quote> to
+ denote any character, zero or more times. In other words, any string at all.
+ So we start with a literal forward slash, then our regular expression pattern
+ (<quote>.*</quote>) another literal forward slash, the string
+ <quote>banners</quote>, another forward slash, and lastly another
+ <quote>.*</quote>. We are building
+ a directory path here. This will match any file with the path that has a
+ directory named <quote>banners</quote> in it. The <quote>.*</quote> matches
+ any characters, and this could conceivably be more forward slashes, so it
+ might expand into a much longer looking path. For example, this could match:
+ <quote>/eye/hate/spammers/banners/annoy_me_please.gif</quote>, or just
+ <quote>/banners/annoying.html</quote>, or almost an infinite number of other
+ possible combinations, just so it has <quote>banners</quote> in the path
+ somewhere.