Privoxy 3.0.20 User Manual
Prev		Next

9. Filter - Files

- -

On-the-fly text substitutions need to be defined in a "filter file". Once defined, they can then be invoked as - an "action".

- -

Privoxy supports three different - filter actions: filter to rewrite the content that is - send to the client, client-header-filter to - rewrite headers that are send by the client, and server-header-filter to - rewrite headers that are send by the server.

- -

Privoxy also supports two tagger - actions: client-header-tagger - and server-header-tagger. - Taggers and filters use the same syntax in the filter files, the - difference is that taggers don't modify the text they are filtering, but - use a rewritten version of the filtered text as tag. The tags can then be - used to change the applying actions through sections with tag-patterns.

- -

Multiple filter files can be defined through the filterfile config - directive. The filters as supplied by the developers are located in - default.filter. It is recommended that any - locally defined or modified filters go in a separately defined file such - as user.filter.

- -

Common tasks for content filters are to eliminate common annoyances in - HTML and JavaScript, such as pop-up windows, exit consoles, crippled - windows without navigation tools, the infamous <BLINK> tag etc, to - suppress images with certain width and height attributes (standard banner - sizes or web-bugs), or just to have fun.

- -

Enabled content filters are applied to any content whose "Content Type" header is recognised as a sign of - text-based content, with the exception of text/plain. Use the force-text-mode action to also - filter other content.

- -

Substitutions are made at the source level, so if you want to - "roll your own" filters, you should first be - familiar with HTML syntax, and, of course, regular expressions.

- -

Just like the actions files, the - filter file is organized in sections, which are called filters here. Each filter consists of a - heading line, that starts with one of the keywords FILTER:, - CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: followed by the filter's - name, and a short (one line) - description of what it does. - Below that line come the jobs, - i.e. lines that define the actual text substitutions. By convention, the - name of a filter should describe what the filter eliminates. The comment is used in the - web-based user - interface.

- -

Once a filter called name has been - defined in the filter file, it can be invoked by using an action of the - form +filter{name} in any actions file.

- -

Filter definitions start with a header line that contains the filter - type, the filter name and the filter description. A content filter header - line for a filter called "foo" could look like - this:

- - - - - -

-FILTER: foo Replace all "foo" with "bar"
-

- -

Below that line, and up to the next header line, come the jobs that - define what text replacements the filter executes. They are specified in - a syntax that imitates Perl's s/// operator. If you are - familiar with Perl, you will find this to be quite intuitive, and may - want to look at the PCRS documentation for the subtle differences to Perl - behaviour. Most notably, the non-standard option letter U is supported, which turns the default to ungreedy - matching.

- -

If you are new to "Regular Expressions", you might - want to take a look at the Appendix on - regular expressions, and see the Perl manual for - the - s/// operator's syntax and Perl-style regular - expressions in general. The below examples might also help to get you - started.

- -

9.1. Filter File - Tutorial

- -

Now, let's complete our "foo" content - filter. We have already defined the heading, but the jobs are still - missing. Since all it does is to replace "foo" with "bar", there is - only one (trivial) job needed:

- - + + + Filter Files + + + + + + + + + + +

- + -

- -s/foo/bar/ - -	+ Privoxy 3.0.25 User Manual +

- -

But wait! Didn't the comment say that all occurrences of "foo" should be replaced? Our current job will only take - care of the first "foo" on each page. For - global substitution, we'll need to add the g - option:

- - - + +

-s/foo/bar/g
-

+ Prev +

+ Next

- -

Our complete filter now looks like this:

- - +

+ +

+ 9. Filter Files +

+ On-the-fly text substitutions need to be defined in a "filter file". Once defined, they can then be invoked + as an "action". +

+ Privoxy supports three different + pcrs-based filter actions: filter to rewrite the content + that is send to the client, client-header-filter + to rewrite headers that are send by the client, and server-header-filter + to rewrite headers that are send by the server. +

+ Privoxy also supports two tagger + actions: client-header-tagger + and server-header-tagger. + Taggers and filters use the same syntax in the filter files, the + difference is that taggers don't modify the text they are filtering, + but use a rewritten version of the filtered text as tag. The tags can + then be used to change the applying actions through sections with tag-patterns. +

+ Finally Privoxy supports the external-filter action + to enable external filters + written in proper programming languages. +

+ Multiple filter files can be defined through the filterfile config + directive. The filters as supplied by the developers are located in + default.filter. It is recommended that any + locally defined or modified filters go in a separately defined file + such as user.filter. +

+ Common tasks for content filters are to eliminate common annoyances + in HTML and JavaScript, such as pop-up windows, exit consoles, + crippled windows without navigation tools, the infamous <BLINK> + tag etc, to suppress images with certain width and height attributes + (standard banner sizes or web-bugs), or just to have fun. +

+ Enabled content filters are applied to any content whose "Content Type" header is recognised as a sign of + text-based content, with the exception of text/plain. Use the force-text-mode action to + also filter other content. +

+ Substitutions are made at the source level, so if you want to "roll your own" filters, you should first be + familiar with HTML syntax, and, of course, regular expressions. +

+ Just like the actions files, the + filter file is organized in sections, which are called filters here. Each filter + consists of a heading line, that starts with one of the keywords FILTER:, CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: followed by the filter's name, and a short + (one line) description of what it does. Below that line + come the jobs, + i.e. lines that define the actual text substitutions. By convention, + the name of a filter should describe what the filter eliminates. The comment is + used in the web-based user interface. +

+ Once a filter called name has + been defined in the filter file, it can be invoked by using an action + of the form +filter{name} in any actions file. +

+ Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. A content filter + header line for a filter called "foo" + could look like this: +

+ FILTER: foo Replace all "foo" with "bar"
-s/foo/bar/g

Let's look at some real filters for more interesting examples. Here - you see a filter that protects against some common annoyances that - arise from JavaScript abuse. Let's look at its jobs one after the - other:

- - - -

+      
+        Below that line, and up to the next header line, come the jobs that
+        define what text replacements the filter executes. They are specified
+        in a syntax that imitates Perl's s/// operator. If you are
+        familiar with Perl, you will find this to be quite intuitive, and may
+        want to look at the PCRS documentation for the subtle differences to
+        Perl behaviour.
+      
+      
+        Most notably, the non-standard option letter U is supported, which turns the default to ungreedy
+        matching (add ? to quantifiers to turn them
+        greedy again).
+      
+      
+        The non-standard option letter D (dynamic)
+        allows to use the variables $host, $origin (the IP address the
+        request came from), $path, $url and $listen-address (the address on
+        which Privoxy accepted the client request. Example: 127.0.0.1:8118).
+        They will be replaced with the value they refer to before the filter
+        is executed.
+      
+      
+        Note that '$' is a bad choice for a delimiter in a dynamic filter as
+        you might end up with unintended variables if you use a variable name
+        directly after the delimiter. Variables will be resolved without
+        escaping anything, therefore you also have to be careful not to chose
+        delimiters that appear in the replacement text. For example '<'
+        should be save, while '?' will sooner or later cause conflicts with
+        $url.
+      
+      
+        The non-standard option letter T (trivial)
+        prevents parsing for backreferences in the substitute. Use it if you
+        want to include text like '$&' in your substitute without
+        quoting.
+      
+      
+        If you are new to "Regular Expressions", you
+        might want to take a look at the Appendix on regular expressions, and see
+        the Perl
+        manual for the s/// operator's syntax and Perl-style
+        regular expressions in general. The below examples might also
+        help to get you started.
+      
+      
+        
+          9.1. Filter File Tutorial
+        
+        
+          Now, let's complete our "foo" content
+          filter. We have already defined the heading, but the jobs are still
+          missing. Since all it does is to replace "foo" with "bar", there
+          is only one (trivial) job needed:
+        
+        
+        
+        
+          
+            
+          
+        
++s/foo/bar/
+
+            
+
+        
+          But wait! Didn't the comment say that all occurrences of "foo" should be replaced? Our current job will only
+          take care of the first "foo" on each
+          page. For global substitution, we'll need to add the g option:
+        
+        
+        
+        
+          
+            
+          
+        
++s/foo/bar/g
+
+            
+
+        
+          Our complete filter now looks like this:
+        
+        
+        
+        
+          
+            
+          
+        
++FILTER: foo Replace all "foo" with "bar"
+s/foo/bar/g
+
+            
+
+        
+          Let's look at some real filters for more interesting examples. Here
+          you see a filter that protects against some common annoyances that
+          arise from JavaScript abuse. Let's look at its jobs one after the
+          other:
+        
+        
+        
+        
+          
+            
-        
-      
+ FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
 
 # Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
 #
 s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
 
-          
-
-      Following the header line and a comment, you see the job. Note that
-      it uses | as the delimiter instead of
-      /, because the pattern contains a forward
-      slash, which would otherwise have to be escaped by a backslash
-      (\).
-
-      Now, let's examine the pattern: it starts with the text <script.* enclosed in parentheses. Since the dot
-      matches any character, and * means:
-      "Match an arbitrary number of the element left of
-      myself", this matches "<script",
-      followed by any text, i.e. it
-      matches the whole page, from the start of the first <script>
-      tag.
-
-      That's more than we want, but the pattern continues: document\.referrer matches only the exact string
-      "document.referrer". The dot needed to be
-      escaped, i.e. preceded by a
-      backslash, to take away its special meaning as a joker, and make it
-      just a regular dot. So far, the meaning is: Match from the start of the
-      first <script> tag in a the page, up to, and including, the text
-      "document.referrer", if both are present in the page (and appear
-      in that order).
-
-      But there's still more pattern to go. The next element, again
-      enclosed in parentheses, is .*</script>.
-      You already know what .* means, so the whole
-      pattern translates to: Match from the start of the first <script>
-      tag in a page to the end of the last <script> tag, provided that
-      the text "document.referrer" appears
-      somewhere in between.
-
-      This is still not the whole story, since we have ignored the options
-      and the parentheses: The portions of the page matched by sub-patterns
-      that are enclosed in parentheses, will be remembered and be available
-      through the variables $1, $2, ... in the
-      substitute. The U option switches to ungreedy
-      matching, which means that the first .* in the
-      pattern will only "eat up" all text in
-      between "<script" and the first occurrence of "document.referrer", and that the second .* will only span the text up to the first "</script>" tag. Furthermore, the s option says that the match may span multiple lines in
-      the page, and the g option again means that
-      the substitution is global.
-
-      So, to summarize, the pattern means: Match all scripts that contain
-      the text "document.referrer". Remember the
-      parts of the script from (and including) the start tag up to (and
-      excluding) the string "document.referrer" as
-      $1, and the part following that string, up to
-      and including the closing tag, as $2.
-
-      Now the pattern is deciphered, but wasn't this about substituting
-      things? So lets look at the substitute: $1"Not Your
-      Business!"$2 is easy to read: The text remembered as $1, followed by "Not Your
-      Business!" (including
-      the quotation marks!), followed by the text remembered as $2. This produces an exact copy of the original string,
-      with the middle part (the "document.referrer") replaced by "Not Your Business!".
-
-      The whole job now reads: Replace "document.referrer" by "Not Your
-      Business!" wherever it appears inside a <script> tag. Note
-      that this job won't break JavaScript syntax, since both the original
-      and the replacement are syntactically valid string objects. The script
-      just won't have access to the referrer information anymore.
-
-      We'll show you two other jobs from the JavaScript taming department,
-      but this time only point out the constructs of special interest:
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Following the header line and a comment, you see the job. Note that
+          it uses | as the delimiter instead of /, because the pattern contains a forward
+          slash, which would otherwise have to be escaped by a backslash (\).
+        
+        
+          Now, let's examine the pattern: it starts with the text <script.* enclosed in parentheses. Since the dot
+          matches any character, and * means: "Match an arbitrary number of the element left of
+          myself", this matches "<script", followed by any text, i.e. it matches the whole
+          page, from the start of the first <script> tag.
+        
+        
+          That's more than we want, but the pattern continues: document\.referrer matches only the exact string
+          "document.referrer". The dot needed to
+          be escaped,
+          i.e. preceded by a backslash, to take away its special meaning as a
+          joker, and make it just a regular dot. So far, the meaning is:
+          Match from the start of the first <script> tag in a the page,
+          up to, and including, the text "document.referrer", if both are present in the page (and
+          appear in that order).
+        
+        
+          But there's still more pattern to go. The next element, again
+          enclosed in parentheses, is .*</script>. You already know what .* means, so the whole pattern translates to: Match
+          from the start of the first <script> tag in a page to the end
+          of the last <script> tag, provided that the text "document.referrer" appears somewhere in between.
+        
+        
+          This is still not the whole story, since we have ignored the
+          options and the parentheses: The portions of the page matched by
+          sub-patterns that are enclosed in parentheses, will be remembered
+          and be available through the variables $1, $2,
+          ... in the substitute. The U option
+          switches to ungreedy matching, which means that the first .* in the pattern will only "eat up" all text in between "<script" and the first occurrence of "document.referrer", and that the second .* will only span the text up to the first "</script>" tag. Furthermore, the s option says that the match may span multiple lines
+          in the page, and the g option again means
+          that the substitution is global.
+        
+        
+          So, to summarize, the pattern means: Match all scripts that contain
+          the text "document.referrer". Remember
+          the parts of the script from (and including) the start tag up to
+          (and excluding) the string "document.referrer" as $1,
+          and the part following that string, up to and including the closing
+          tag, as $2.
+        
+        
+          Now the pattern is deciphered, but wasn't this about substituting
+          things? So lets look at the substitute: $1"Not
+          Your Business!"$2 is easy to read: The text remembered as $1, followed by "Not Your
+          Business!" (including the quotation marks!), followed by
+          the text remembered as $2. This produces
+          an exact copy of the original string, with the middle part (the
+          "document.referrer") replaced by "Not Your Business!".
+        
+        
+          The whole job now reads: Replace "document.referrer" by "Not Your
+          Business!" wherever it appears inside a <script> tag.
+          Note that this job won't break JavaScript syntax, since both the
+          original and the replacement are syntactically valid string
+          objects. The script just won't have access to the referrer
+          information anymore.
+        
+        
+          We'll show you two other jobs from the JavaScript taming
+          department, but this time only point out the constructs of special
+          interest:
+        
+        
+        
+        
+          
+            
-        
-      
+ # The status bar is for displaying link targets, not pointless blahblah
 #
 s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
 
-          
-
-      \s stands for whitespace characters (space,
-      tab, newline, carriage return, form feed), so that \s* means: "zero or more
-      whitespace". The ? in .*? makes this matching of arbitrary text ungreedy.
-      (Note that the U option is not set). The
-      ['"] construct means: "a
-      single or a double
-      quote". Finally, \1 is a back-reference
-      to the first parenthesis just like $1 above,
-      with the difference that in the pattern, a backslash indicates a
-      back-reference, whereas in the substitute, it's the dollar.
-
-      So what does this job do? It replaces assignments of single- or
-      double-quoted strings to the "window.status"
-      object with a dummy assignment (using a variable name that is hopefully
-      odd enough not to conflict with real variables in scripts). Thus, it
-      catches many cases where e.g. pointless descriptions are displayed in
-      the status bar instead of the link target when you move your mouse over
-      links.
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          \s stands for whitespace characters
+          (space, tab, newline, carriage return, form feed), so that \s* means: "zero or more
+          whitespace". The ? in .*? makes this matching of arbitrary text ungreedy.
+          (Note that the U option is not set). The
+          ['"] construct means: "a single or a double quote". Finally, \1 is a back-reference to the first
+          parenthesis just like $1 above, with the
+          difference that in the pattern, a backslash indicates a
+          back-reference, whereas in the substitute, it's the dollar.
+        
+        
+          So what does this job do? It replaces assignments of single- or
+          double-quoted strings to the "window.status" object with a dummy assignment
+          (using a variable name that is hopefully odd enough not to conflict
+          with real variables in scripts). Thus, it catches many cases where
+          e.g. pointless descriptions are displayed in the status bar instead
+          of the link target when you move your mouse over links.
+        
+        
+        
+        
+          
+            
-        
-      
+ # Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
 #
 s/(<body [^>]*)onunload(.*>)/$1never$2/iU
 
-          
-
-      Including the OnUnload event binding in the HTML DOM was a
-      CRIME. When I close a browser
-      window, I want it to close and die. Basta. This job replaces the
-      "onunload" attribute in "<body>" tags with the dummy word never. Note that the i option
-      makes the pattern matching case-insensitive. Also note that ungreedy
-      matching alone doesn't always guarantee a minimal match: In the first
-      parenthesis, we had to use [^>]* instead of
-      .* to prevent the match from exceeding the
-      <body> tag if it doesn't contain "OnUnload", but the page's content does.
-
-      The last example is from the fun department:
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Including the OnUnload event binding in the HTML DOM was a
+          CRIME. When I
+          close a browser window, I want it to close and die. Basta. This job
+          replaces the "onunload" attribute in
+          "<body>" tags with the dummy word
+          never. Note that the i option makes the pattern matching
+          case-insensitive. Also note that ungreedy matching alone doesn't
+          always guarantee a minimal match: In the first parenthesis, we had
+          to use [^>]* instead of .* to prevent the match from exceeding the
+          <body> tag if it doesn't contain "OnUnload", but the page's content does.
+        
+        
+          The last example is from the fun department:
+        
+        
+        
+        
+          
+            
-        
-      
+ FILTER: fun Fun text replacements
 
 # Spice the daily news:
 #
 s/microsoft(?!\.com)/MicroSuck/ig
 
-          
-
-      Note the (?!\.com) part (a so-called
-      negative lookahead) in the job's pattern, which means: Don't match, if
-      the string ".com" appears directly following
-      "microsoft" in the page. This prevents links
-      to microsoft.com from being trashed, while still replacing the word
-      everywhere else.
-
-      
-        
-          
+          
+        
-            +            
+
+        
+          Note the (?!\.com) part (a so-called
+          negative lookahead) in the job's pattern, which means: Don't match,
+          if the string ".com" appears directly
+          following "microsoft" in the page. This
+          prevents links to microsoft.com from being trashed, while still
+          replacing the word everywhere else.
+        
+        
+        
+        
+          
+            
+          
+        
+ # Buzzword Bingo (example for extended regex syntax)
 #
 s* industry[ -]leading \
@@ -413,429 +494,631 @@ s* industry[ -]leading \
 *<font color="red"><b>BINGO!</b></font> \
 *igx
 
+            
+
+        
+          The x option in this job turns on extended
+          syntax, and allows for e.g. the liberal use of (non-interpreted!)
+          whitespace for nicer formatting.
+        
+        
+          You get the idea?
+        
+      
+      
+        
+          9.2. The Pre-defined Filters
+        
+        
+          The distribution default.filter file
+          contains a selection of pre-defined filters for your convenience:
+        
+        
+          
+            
+              js-annoyances
+            
+            
+              
+                The purpose of this filter is to get rid of particularly
+                annoying JavaScript abuse. To that end, it
+              
+              
+                
+                  
+                    replaces JavaScript references to the browser's referrer
+                    information with the string "Not Your Business!". This
+                    compliments the hide-referrer
+                    action on the content level.
+                  
+                
+                
+                  
+                    removes the bindings to the DOM's unload event which we feel has no
+                    right to exist and is responsible for most "exit consoles", i.e. nasty windows that
+                    pop up when you close another one.
+                  
+                
+                
+                  
+                    removes code that causes new windows to be opened with
+                    undesired properties, such as being full-screen,
+                    non-resizeable, without location, status or menu bar etc.
+                  
+                
+              
+
+              
+                Use with caution. This is an aggressive filter, and can break
+                sites that rely heavily on JavaScript.
+              
+            
+            
+              js-events
+            
+            
+              
+                This is a very radical measure. It removes virtually all
+                JavaScript event bindings, which means that scripts can not
+                react to user actions such as mouse movements or clicks,
+                window resizing etc, anymore. Use with caution!
+              
+              
+                We strongly
+                discourage using this filter as a default since it
+                breaks many legitimate scripts. It is meant for use only on
+                extra-nasty sites (should you really need to go there).
+              
+            
+            
+              html-annoyances
+            
+            
+              
+                This filter will undo many common instances of HTML based
+                abuse.
+              
+              
+                The BLINK and MARQUEE tags are neutralized (yeah baby!), and
+                browser windows will be created as resizeable (as of course
+                they should be!), and will have location, scroll and menu
+                bars -- even if specified otherwise.
+              
+            
+            
+              content-cookies
+            
+            
+              
+                Most cookies are set in the HTTP dialog, where they can be
+                intercepted by the crunch-incoming-cookies
+                and crunch-outgoing-cookies
+                actions. But web sites increasingly make use of HTML meta
+                tags and JavaScript to sneak cookies to the browser on the
+                content level.
+              
+              
+                This filter disables most HTML and JavaScript code that reads
+                or sets cookies. It cannot detect all clever uses of these
+                types of code, so it should not be relied on as an absolute
+                fix. Use it wherever you would also use the cookie crunch
+                actions.
+              
+            
+            
+              refresh-tags
+            
+            
+              
+                Disable any refresh tags if the interval is greater than nine
+                seconds (so that redirections done via refresh tags are not
+                destroyed). This is useful for dial-on-demand setups, or for
+                those who find this HTML feature annoying.
+              
+            
+            
+              unsolicited-popups
+            
+            
+              
+                This filter attempts to prevent only "unsolicited" pop-up windows from opening, yet
+                still allow pop-up windows that the user has explicitly
+                chosen to open. It was added in version 3.0.1, as an
+                improvement over earlier such filters.
+              
+              
+                Technical note: The filter works by redefining the
+                window.open JavaScript function to a dummy function, PrivoxyWindowOpen(), during the loading
+                and rendering phase of each HTML page access, and restoring
+                the function afterward.
+              
+              
+                This is recommended only for browsers that cannot perform
+                this function reliably themselves. And be aware that some
+                sites require such windows in order to function normally. Use
+                with caution.
+              
+            
+            
+              all-popups
+            
+            
+              
+                Attempt to prevent all pop-up windows from opening. Note
+                this should be used with even more discretion than the above,
+                since it is more likely to break some sites that require
+                pop-ups for normal usage. Use with caution.
+              
+            
+            
+              img-reorder
+            
+            
+              
+                This is a helper filter that has no value if used alone. It
+                makes the banners-by-size and banners-by-link (see below) filters more
+                effective and should be enabled together with them.
+              
+            
+            
+              banners-by-size
+            
+            
+              
+                This filter removes image tags purely based on what size they
+                are. Fortunately for us, many ads and banner images tend to
+                conform to certain standardized sizes, which makes this
+                filter quite effective for ad stripping purposes.
+              
+              
+                Occasionally this filter will cause false positives on images
+                that are not ads, but just happen to be of one of the
+                standard banner sizes.
+              
+              
+                Recommended only for those who require extreme ad blocking.
+                The default block rules should catch 95+% of all ads without this
+                filter enabled.
+              
+            
+            
+              banners-by-link
+            
+            
+              
+                This is an experimental filter that attempts to kill any
+                banners if their URLs seem to point to known or suspected
+                click trackers. It is currently not of much value and is not
+                recommended for use by default.
+              
+            
+            
+              webbugs
+            
+            
+              
+                Webbugs are small, invisible images (technically 1X1 GIF
+                images), that are used to track users across websites, and
+                collect information on them. As an HTML page is loaded by the
+                browser, an embedded image tag causes the browser to contact
+                a third-party site, disclosing the tracking information
+                through the requested URL and/or cookies for that third-party
+                domain, without the user ever becoming aware of the
+                interaction with the third-party site. HTML-ized spam also
+                uses a similar technique to verify email addresses.
+              
+              
+                This filter removes the HTML code that loads such "webbugs".
+              
+            
+            
+              tiny-textforms
+            
+            
+              
+                A rather special-purpose filter that can be used to enlarge
+                textareas (those multi-line text boxes in web forms) and turn
+                off hard word wrap in them. It was written for the
+                sourceforge.net tracker system where such boxes are a
+                nuisance, but it can be handy on other sites, too.
+              
+              
+                It is not recommended to use this filter as a default.
+              
+            
+            
+              jumping-windows
+            
+            
+              
+                Many consider windows that move, or resize themselves to be
+                abusive. This filter neutralizes the related JavaScript code.
+                Note that some sites might not display or behave as intended
+                when using this filter. Use with caution.
+              
+            
+            
+              frameset-borders
+            
+            
+              
+                Some web designers seem to assume that everyone in the world
+                will view their web sites using the same browser brand and
+                version, screen resolution etc, because only that assumption
+                could explain why they'd use static frame sizes, yet prevent
+                their frames from being resized by the user, should they be
+                too small to show their whole content.
+              
+              
+                This filter removes the related HTML code. It should only be
+                applied to sites which need it.
+              
+            
+            
+              demoronizer
+            
+            
+              
+                Many Microsoft products that generate HTML use non-standard
+                extensions (read: violations) of the ISO 8859-1 aka Latin-1
+                character set. This can cause those HTML documents to display
+                with errors on standard-compliant platforms.
+              
+              
+                This filter translates the MS-only characters into Latin-1
+                equivalents. It is not necessary when using MS products, and
+                will cause corruption of all documents that use 8-bit
+                character sets other than Latin-1. It's mostly worthwhile for
+                Europeans on non-MS platforms, if weird garbage characters
+                sometimes appear on some pages, or user agents that don't
+                correct for this on the fly.
+              
+            
+            
+              shockwave-flash
+            
+            
+              
+                A filter for shockwave haters. As the name suggests, this
+                filter strips code out of web pages that is used to embed
+                shockwave flash objects.
+              
+              
+              
+            
+            
+              quicktime-kioskmode
+            
+            
+              
+                Change HTML code that embeds Quicktime objects so that
+                kioskmode, which prevents saving, is disabled.
+              
+            
+            
+              fun
+            
+            
+              
+                Text replacements for subversive browsing fun. Make fun of
+                your favorite Monopolist or play buzzword bingo.
+              
+            
+            
+              crude-parental
+            
+            
+              
+                A demonstration-only filter that shows how Privoxy can be used to delete web
+                content on a keyword basis.
+              
+            
+            
+              ie-exploits
+            
+            
+              
+                An experimental collection of text replacements to disable
+                malicious HTML and JavaScript code that exploits known
+                security holes in Internet Explorer.
+              
+              
+                Presently, it only protects against Nimda and a cross-site
+                scripting bug, and would need active maintenance to provide
+                more substantial protection.
+              
+            
+            
+              site-specifics
+            
+            
+              
+                Some web sites have very specific problems, the cure for
+                which doesn't apply anywhere else, or could even cause damage
+                on other sites.
+              
+              
+                This is a collection of such site-specific cures which should
+                only be applied to the sites they were intended for, which is
+                what the supplied default.action
+                file does. Users shouldn't need to change anything regarding
+                this filter.
+              
+            
+            
+              google
+            
+            
+              
+                A CSS based block for Google text ads. Also removes a width
+                limitation and the toolbar advertisement.
+              
+            
+            
+              yahoo
+            
+            
+              
+                Another CSS based block, this time for Yahoo text ads. And
+                removes a width limitation as well.
+              
+            
+            
+              msn
+            
+            
+              
+                Another CSS based block, this time for MSN text ads. And
+                removes tracking URLs, as well as a width limitation.
+              
+            
+            
+              blogspot
+            
+            
+              
+                Cleans up some Blogspot blogs. Read the fine print before
+                using this one!
+              
+              
+                This filter also intentionally removes some navigation stuff
+                and sets the page width to 100%. As a result, some rounded
+                "corners" would appear to early or
+                not at all and as fixing this would require a browser that
+                understands background-size (CSS3), they are removed instead.
+              
+            
+            
+              xml-to-html
+            
+            
+              
+                Server-header filter to change the Content-Type from xml to
+                html.
+              
+            
+            
+              html-to-xml
+            
+            
+              
+                Server-header filter to change the Content-Type from html to
+                xml.
+              
+            
+            
+              no-ping
+            
+            
+              
+                Removes the non-standard ping
+                attribute from anchor and area HTML tags.
+              
+            
+            
+              hide-tor-exit-notation
+            
+            
+              
+                Client-header filter to remove the Tor
+                exit node notation found in Host and Referer headers.
+              
+              
+                If Privoxy and Tor are chained and Privoxy is configured to use socks4a,
+                one can use "http://www.example.org.foobar.exit/" to
+                access the host "www.example.org"
+                through the Tor exit node "foobar".
+              
+              
+                As the HTTP client isn't aware of this notation, it treats
+                the whole string "www.example.org.foobar.exit" as host and uses
+                it for the "Host" and "Referer" headers. From the server's point of
+                view the resulting headers are invalid and can cause
+                problems.
+              
+              
+                An invalid "Referer" header can
+                trigger "hot-linking" protections,
+                an invalid "Host" header will make
+                it impossible for the server to find the right vhost (several
+                domains hosted on the same IP address).
+              
+              
+                This client-header filter removes the "foo.exit" part in those headers to prevent
+                the mentioned problems. Note that it only modifies the HTTP
+                headers, it doesn't make it impossible for the server to
+                detect your Tor exit node based on the
+                IP address the request is coming from.
+              
+            
+          
+        
+      
+      
+        
+          9.3. External filter syntax
+        
+        
+          External filters are scripts or programs that can modify the
+          content in case common filters aren't powerful enough.
+        
+        
+          External filters can be written in any language the platform Privoxy runs on supports.
+        
+        
+          They are controlled with the external-filter action
+          and have to be defined in the filterfile first.
+        
+        
+          The header looks like any other filter, but instead of pcrs jobs,
+          external filters contain a single job which can be a program or a
+          shell script (which may call other scripts or programs).
+        
+        
+          External filters read the content from STDIN and write the
+          rewritten content to STDOUT. The environment variables PRIVOXY_URL,
+          PRIVOXY_PATH, PRIVOXY_HOST, PRIVOXY_ORIGIN, PRIVOXY_LISTEN_ADDRESS
+          can be used to get some details about the client request.
+        
+        
+          Privoxy will temporary store the
+          content to filter in the temporary-directory.
+        
+        
+        
+        
+          
+            
+          
+        
++EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content
+/bin/cat
+
+# Incorrect reimplementation of the filter above in POSIX shell.
+#
+# Note that it's a single job that spans multiple lines, the line
+# breaks are not passed to the shell, thus the semicolons are required.
+#
+# If the script isn't trivial, it is recommended to put it into an external file.
+#
+# In general, writing external filters entirely in POSIX shell is not
+# considered a good idea.
+EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content
+while read line; \
+do \
+  echo "$line"; \
+done
+
+EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value.
+/usr/local/bin/convert - -rotate 180 -
+
+EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment.
+/usr/local/bin/convert - -pointsize 16 -fill white  -annotate +17+418 "[citation needed]" -
+
+            
+
+        
+          
+            
+              
+            
+            
+              
+            
+          
+                Warning
+              

+                
+                  Currently external filters are executed with Privoxy's privileges! Only use
+                  external filters you understand and trust.
+                
+              
+        
+        
+          External filters are experimental and the syntax may change in the
+          future.
+        
+      
+    
+    
+      
+      
+        
+          
+          
+          
+        
+        
+          
+          
+          
+            Prev
+          
+            Home
+          
+            Next
+          

+            Actions Files
+          
+             
+          
+            Privoxy's Template Files
           
         
       
-
-      The x option in this job turns on extended
-      syntax, and allows for e.g. the liberal use of (non-interpreted!)
-      whitespace for nicer formatting.
-
-      You get the idea?
     
-
-    
-      9.2. The Pre-defined Filters
-
-      The distribution default.filter file
-      contains a selection of pre-defined filters for your convenience:
-
-      
-        
-          js-annoyances
-
-          
-            The purpose of this filter is to get rid of particularly
-            annoying JavaScript abuse. To that end, it
-
-            
-              
-                replaces JavaScript references to the browser's referrer
-                information with the string "Not Your Business!". This
-                compliments the hide-referrer
-                action on the content level.
-              
-
-              
-                removes the bindings to the DOM's unload event which we feel has no right to
-                exist and is responsible for most "exit
-                consoles", i.e. nasty windows that pop up when you
-                close another one.
-              
-
-              
-                removes code that causes new windows to be opened with
-                undesired properties, such as being full-screen,
-                non-resizeable, without location, status or menu bar etc.
-              
-            
-
-            Use with caution. This is an aggressive filter, and can break
-            sites that rely heavily on JavaScript.
-          
-
-          js-events
-
-          
-            This is a very radical measure. It removes virtually all
-            JavaScript event bindings, which means that scripts can not react
-            to user actions such as mouse movements or clicks, window
-            resizing etc, anymore. Use with caution!
-
-            We strongly
-            discourage using this filter as a default since it breaks
-            many legitimate scripts. It is meant for use only on extra-nasty
-            sites (should you really need to go there).
-          
-
-          html-annoyances
-
-          
-            This filter will undo many common instances of HTML based
-            abuse.
-
-            The BLINK and MARQUEE tags are neutralized (yeah baby!), and
-            browser windows will be created as resizeable (as of course they
-            should be!), and will have location, scroll and menu bars -- even
-            if specified otherwise.
-          
-
-          content-cookies
-
-          
-            Most cookies are set in the HTTP dialog, where they can be
-            intercepted by the crunch-incoming-cookies
-            and crunch-outgoing-cookies
-            actions. But web sites increasingly make use of HTML meta tags
-            and JavaScript to sneak cookies to the browser on the content
-            level.
-
-            This filter disables most HTML and JavaScript code that reads
-            or sets cookies. It cannot detect all clever uses of these types
-            of code, so it should not be relied on as an absolute fix. Use it
-            wherever you would also use the cookie crunch actions.
-          
-
-          refresh tags
-
-          
-            Disable any refresh tags if the interval is greater than nine
-            seconds (so that redirections done via refresh tags are not
-            destroyed). This is useful for dial-on-demand setups, or for
-            those who find this HTML feature annoying.
-          
-
-          unsolicited-popups
-
-          
-            This filter attempts to prevent only "unsolicited" pop-up windows from opening, yet
-            still allow pop-up windows that the user has explicitly chosen to
-            open. It was added in version 3.0.1, as an improvement over
-            earlier such filters.
-
-            Technical note: The filter works by redefining the window.open
-            JavaScript function to a dummy function, PrivoxyWindowOpen(), during the loading and
-            rendering phase of each HTML page access, and restoring the
-            function afterward.
-
-            This is recommended only for browsers that cannot perform this
-            function reliably themselves. And be aware that some sites
-            require such windows in order to function normally. Use with
-            caution.
-          
-
-          all-popups
-
-          
-            Attempt to prevent all pop-up windows from opening.
-            Note this should be used with even more discretion than the
-            above, since it is more likely to break some sites that require
-            pop-ups for normal usage. Use with caution.
-          
-
-          img-reorder
-
-          
-            This is a helper filter that has no value if used alone. It
-            makes the banners-by-size and banners-by-link (see below) filters more effective
-            and should be enabled together with them.
-          
-
-          banners-by-size
-
-          
-            This filter removes image tags purely based on what size they
-            are. Fortunately for us, many ads and banner images tend to
-            conform to certain standardized sizes, which makes this filter
-            quite effective for ad stripping purposes.
-
-            Occasionally this filter will cause false positives on images
-            that are not ads, but just happen to be of one of the standard
-            banner sizes.
-
-            Recommended only for those who require extreme ad blocking.
-            The default block rules should catch 95+% of all ads without this filter enabled.
-          
-
-          banners-by-link
-
-          
-            This is an experimental filter that attempts to kill any
-            banners if their URLs seem to point to known or suspected click
-            trackers. It is currently not of much value and is not
-            recommended for use by default.
-          
-
-          webbugs
-
-          
-            Webbugs are small, invisible images (technically 1X1 GIF
-            images), that are used to track users across websites, and
-            collect information on them. As an HTML page is loaded by the
-            browser, an embedded image tag causes the browser to contact a
-            third-party site, disclosing the tracking information through the
-            requested URL and/or cookies for that third-party domain, without
-            the user ever becoming aware of the interaction with the
-            third-party site. HTML-ized spam also uses a similar technique to
-            verify email addresses.
-
-            This filter removes the HTML code that loads such "webbugs".
-          
-
-          tiny-textforms
-
-          
-            A rather special-purpose filter that can be used to enlarge
-            textareas (those multi-line text boxes in web forms) and turn off
-            hard word wrap in them. It was written for the sourceforge.net
-            tracker system where such boxes are a nuisance, but it can be
-            handy on other sites, too.
-
-            It is not recommended to use this filter as a default.
-          
-
-          jumping-windows
-
-          
-            Many consider windows that move, or resize themselves to be
-            abusive. This filter neutralizes the related JavaScript code.
-            Note that some sites might not display or behave as intended when
-            using this filter. Use with caution.
-          
-
-          frameset-borders
-
-          
-            Some web designers seem to assume that everyone in the world
-            will view their web sites using the same browser brand and
-            version, screen resolution etc, because only that assumption
-            could explain why they'd use static frame sizes, yet prevent
-            their frames from being resized by the user, should they be too
-            small to show their whole content.
-
-            This filter removes the related HTML code. It should only be
-            applied to sites which need it.
-          
-
-          demoronizer
-
-          
-            Many Microsoft products that generate HTML use non-standard
-            extensions (read: violations) of the ISO 8859-1 aka Latin-1
-            character set. This can cause those HTML documents to display
-            with errors on standard-compliant platforms.
-
-            This filter translates the MS-only characters into Latin-1
-            equivalents. It is not necessary when using MS products, and will
-            cause corruption of all documents that use 8-bit character sets
-            other than Latin-1. It's mostly worthwhile for Europeans on
-            non-MS platforms, if weird garbage characters sometimes appear on
-            some pages, or user agents that don't correct for this on the
-            fly.
-          
-
-          shockwave-flash
-
-          
-            A filter for shockwave haters. As the name suggests, this
-            filter strips code out of web pages that is used to embed
-            shockwave flash objects.
-          
-
-          quicktime-kioskmode
-
-          
-            Change HTML code that embeds Quicktime objects so that
-            kioskmode, which prevents saving, is disabled.
-          
-
-          fun
-
-          
-            Text replacements for subversive browsing fun. Make fun of
-            your favorite Monopolist or play buzzword bingo.
-          
-
-          crude-parental
-
-          
-            A demonstration-only filter that shows how Privoxy can be used to delete web content on
-            a keyword basis.
-          
-
-          ie-exploits
-
-          
-            An experimental collection of text replacements to disable
-            malicious HTML and JavaScript code that exploits known security
-            holes in Internet Explorer.
-
-            Presently, it only protects against Nimda and a cross-site
-            scripting bug, and would need active maintenance to provide more
-            substantial protection.
-          
-
-          site-specifics
-
-          
-            Some web sites have very specific problems, the cure for which
-            doesn't apply anywhere else, or could even cause damage on other
-            sites.
-
-            This is a collection of such site-specific cures which should
-            only be applied to the sites they were intended for, which is
-            what the supplied default.action file
-            does. Users shouldn't need to change anything regarding this
-            filter.
-          
-
-          google
-
-          
-            A CSS based block for Google text ads. Also removes a width
-            limitation and the toolbar advertisement.
-          
-
-          yahoo
-
-          
-            Another CSS based block, this time for Yahoo text ads. And
-            removes a width limitation as well.
-          
-
-          msn
-
-          
-            Another CSS based block, this time for MSN text ads. And
-            removes tracking URLs, as well as a width limitation.
-          
-
-          blogspot
-
-          
-            Cleans up some Blogspot blogs. Read the fine print before
-            using this one!
-
-            This filter also intentionally removes some navigation stuff
-            and sets the page width to 100%. As a result, some rounded
-            "corners" would appear to early or not
-            at all and as fixing this would require a browser that
-            understands background-size (CSS3), they are removed instead.
-          
-
-          xml-to-html
-
-          
-            Server-header filter to change the Content-Type from xml to
-            html.
-          
-
-          html-to-xml
-
-          
-            Server-header filter to change the Content-Type from html to
-            xml.
-          
-
-          no-ping
-
-          
-            Removes the non-standard ping
-            attribute from anchor and area HTML tags.
-          
-
-          hide-tor-exit-notation
-
-          
-            Client-header filter to remove the Tor
-            exit node notation found in Host and Referer headers.
-
-            If Privoxy and Tor are chained and Privoxy is configured to use socks4a, one
-            can use "http://www.example.org.foobar.exit/" to access
-            the host "www.example.org" through the
-            Tor exit node "foobar".
-
-            As the HTTP client isn't aware of this notation, it treats the
-            whole string "www.example.org.foobar.exit" as host and uses it
-            for the "Host" and "Referer" headers. From the server's point of view
-            the resulting headers are invalid and can cause problems.
-
-            An invalid "Referer" header can
-            trigger "hot-linking" protections, an
-            invalid "Host" header will make it
-            impossible for the server to find the right vhost (several
-            domains hosted on the same IP address).
-
-            This client-header filter removes the "foo.exit" part in those headers to prevent the
-            mentioned problems. Note that it only modifies the HTTP headers,
-            it doesn't make it impossible for the server to detect your
-            Tor exit node based on the IP address the
-            request is coming from.
-          
-        
-      
-    
-  
-
-  
-    
-
-    
-      
-        
-
-        
-
-        
-      
-
-      
-        
-
-        
-
-        
-      
-    Prev Home Next
Actions Files   Privoxy's Template
-        Files
-  
-
+  
 
+

9. Filter - Files

9.1. Filter File - Tutorial

+ 9. Filter Files +

+ 9.1. Filter File Tutorial +

+ 9.2. The Pre-defined Filters +

+ 9.3. External filter syntax +

9.2. The Pre-defined Filters