Privoxy 3.0.21 User Manual
Prev		Next

9. Filter + Files

+ +

On-the-fly text substitutions need to be defined in a "filter file". Once defined, they can then be invoked as + an "action".

+ +

Privoxy supports three different + filter actions: filter to rewrite the content that is + send to the client, client-header-filter to + rewrite headers that are send by the client, and server-header-filter to + rewrite headers that are send by the server.

+ +

Privoxy also supports two tagger + actions: client-header-tagger + and server-header-tagger. + Taggers and filters use the same syntax in the filter files, the + difference is that taggers don't modify the text they are filtering, but + use a rewritten version of the filtered text as tag. The tags can then be + used to change the applying actions through sections with tag-patterns.

+ +

Multiple filter files can be defined through the filterfile config + directive. The filters as supplied by the developers are located in + default.filter. It is recommended that any + locally defined or modified filters go in a separately defined file such + as user.filter.

+ +

Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, exit consoles, crippled + windows without navigation tools, the infamous <BLINK> tag etc, to + suppress images with certain width and height attributes (standard banner + sizes or web-bugs), or just to have fun.

+ +

Enabled content filters are applied to any content whose "Content Type" header is recognised as a sign of + text-based content, with the exception of text/plain. Use the force-text-mode action to also + filter other content.

+ +

Substitutions are made at the source level, so if you want to + "roll your own" filters, you should first be + familiar with HTML syntax, and, of course, regular expressions.

+ +

Just like the actions files, the + filter file is organized in sections, which are called filters here. Each filter + consists of a heading line, that starts with one of the keywords FILTER:, CLIENT-HEADER-FILTER: or + SERVER-HEADER-FILTER: followed by the filter's + name, and a short + (one line) description of what it does. Below that line come + the jobs, i.e. + lines that define the actual text substitutions. By convention, the name + of a filter should describe what the filter eliminates. The comment is used + in the web-based user + interface.

+ +

Once a filter called name has been + defined in the filter file, it can be invoked by using an action of the + form +filter{name} in any actions file.

+ +

Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. A content filter header + line for a filter called "foo" could look like + this:

+ + + + + +

+FILTER: foo Replace all "foo" with "bar"
+

+ +

Below that line, and up to the next header line, come the jobs that + define what text replacements the filter executes. They are specified in + a syntax that imitates Perl's s/// operator. If you are + familiar with Perl, you will find this to be quite intuitive, and may + want to look at the PCRS documentation for the subtle differences to Perl + behaviour. Most notably, the non-standard option letter U is supported, which turns the default to ungreedy + matching.

+ +

If you are new to "Regular Expressions", you might + want to take a look at the Appendix on + regular expressions, and see the Perl manual for + the + s/// operator's syntax and Perl-style regular + expressions in general. The below examples might also help to get you + started.

+ +

9.1. Filter File + Tutorial

+ +

Now, let's complete our "foo" content + filter. We have already defined the heading, but the jobs are still + missing. Since all it does is to replace "foo" with "bar", there is + only one (trivial) job needed:

+ + - - -

- Prev -

- Next +

+s/foo/bar/
+

- 9. Filter Files -

- On-the-fly text substitutions need to be defined in a "filter file". Once defined, they can then be invoked - as an "action". -

- Privoxy supports three different - filter actions: filter to rewrite the content - that is send to the client, client-header-filter - to rewrite headers that are send by the client, and server-header-filter - to rewrite headers that are send by the server. -

- Privoxy also supports two tagger - actions: client-header-tagger - and server-header-tagger. - Taggers and filters use the same syntax in the filter files, the - difference is that taggers don't modify the text they are filtering, - but use a rewritten version of the filtered text as tag. The tags can - then be used to change the applying actions through sections with tag-patterns. -

- Multiple filter files can be defined through the filterfile config - directive. The filters as supplied by the developers are located in - default.filter. It is recommended that any - locally defined or modified filters go in a separately defined file - such as user.filter. -

- Common tasks for content filters are to eliminate common annoyances - in HTML and JavaScript, such as pop-up windows, exit consoles, - crippled windows without navigation tools, the infamous <BLINK> - tag etc, to suppress images with certain width and height attributes - (standard banner sizes or web-bugs), or just to have fun. -

- Enabled content filters are applied to any content whose "Content Type" header is recognised as a sign of - text-based content, with the exception of text/plain. Use the force-text-mode action to - also filter other content. -

- Substitutions are made at the source level, so if you want to "roll your own" filters, you should first be - familiar with HTML syntax, and, of course, regular expressions. -

- Just like the actions files, the - filter file is organized in sections, which are called filters here. Each filter - consists of a heading line, that starts with one of the keywords FILTER:, CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: followed by the filter's name, and a short - (one line) description of what it does. Below that line - come the jobs, - i.e. lines that define the actual text substitutions. By convention, - the name of a filter should describe what the filter eliminates. The comment is - used in the web-based user interface. -

- Once a filter called name has - been defined in the filter file, it can be invoked by using an action - of the form +filter{name} in any actions file. -

- Filter definitions start with a header line that contains the filter - type, the filter name and the filter description. A content filter - header line for a filter called "foo" - could look like this: -

+ +

But wait! Didn't the comment say that all occurrences of + "foo" should be replaced? Our current job + will only take care of the first "foo" on + each page. For global substitution, we'll need to add the g option:

-FILTER: foo Replace all "foo" with "bar"
+            +s/foo/bar/g

- Below that line, and up to the next header line, come the jobs that - define what text replacements the filter executes. They are specified - in a syntax that imitates Perl's s/// operator. If you are - familiar with Perl, you will find this to be quite intuitive, and may - want to look at the PCRS documentation for the subtle differences to - Perl behaviour. Most notably, the non-standard option letter U is supported, which turns the default to - ungreedy matching. -

- If you are new to "Regular Expressions", you - might want to take a look at the Appendix on regular expressions, and see - the Perl - manual for the s/// operator's syntax and Perl-style - regular expressions in general. The below examples might also - help to get you started. -

- 9.1. Filter File Tutorial -

- Now, let's complete our "foo" content - filter. We have already defined the heading, but the jobs are still - missing. Since all it does is to replace "foo" with "bar", there - is only one (trivial) job needed: -

- - - - -

-s/foo/bar/
-

- -

- But wait! Didn't the comment say that all occurrences of "foo" should be replaced? Our current job will only - take care of the first "foo" on each - page. For global substitution, we'll need to add the g option: -

- - - - -

-s/foo/bar/g
-

- -

- Our complete filter now looks like this: -

- - -

+      Our complete filter now looks like this:
+
+      
+        
+          
-          
-        
+             FILTER: foo Replace all "foo" with "bar"
 s/foo/bar/g
 
-            
-
-        
-          Let's look at some real filters for more interesting examples. Here
-          you see a filter that protects against some common annoyances that
-          arise from JavaScript abuse. Let's look at its jobs one after the
-          other:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Let's look at some real filters for more interesting examples. Here
+      you see a filter that protects against some common annoyances that
+      arise from JavaScript abuse. Let's look at its jobs one after the
+      other:
+
+      
+        
+          
-          
-        
+             FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
 
 # Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
 #
 s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
 
-            
-
-        
-          Following the header line and a comment, you see the job. Note that
-          it uses | as the delimiter instead of /, because the pattern contains a forward
-          slash, which would otherwise have to be escaped by a backslash (\).
-        
-        
-          Now, let's examine the pattern: it starts with the text <script.* enclosed in parentheses. Since the dot
-          matches any character, and * means: "Match an arbitrary number of the element left of
-          myself", this matches "<script", followed by any text, i.e. it matches the whole
-          page, from the start of the first <script> tag.
-        
-        
-          That's more than we want, but the pattern continues: document\.referrer matches only the exact string
-          "document.referrer". The dot needed to
-          be escaped,
-          i.e. preceded by a backslash, to take away its special meaning as a
-          joker, and make it just a regular dot. So far, the meaning is:
-          Match from the start of the first <script> tag in a the page,
-          up to, and including, the text "document.referrer", if both are present in the page (and
-          appear in that order).
-        
-        
-          But there's still more pattern to go. The next element, again
-          enclosed in parentheses, is .*</script>. You already know what .* means, so the whole pattern translates to: Match
-          from the start of the first <script> tag in a page to the end
-          of the last <script> tag, provided that the text "document.referrer" appears somewhere in between.
-        
-        
-          This is still not the whole story, since we have ignored the
-          options and the parentheses: The portions of the page matched by
-          sub-patterns that are enclosed in parentheses, will be remembered
-          and be available through the variables $1, $2,
-          ... in the substitute. The U option
-          switches to ungreedy matching, which means that the first .* in the pattern will only "eat up" all text in between "<script" and the first occurrence of "document.referrer", and that the second .* will only span the text up to the first "</script>" tag. Furthermore, the s option says that the match may span multiple lines
-          in the page, and the g option again means
-          that the substitution is global.
-        
-        
-          So, to summarize, the pattern means: Match all scripts that contain
-          the text "document.referrer". Remember
-          the parts of the script from (and including) the start tag up to
-          (and excluding) the string "document.referrer" as $1,
-          and the part following that string, up to and including the closing
-          tag, as $2.
-        
-        
-          Now the pattern is deciphered, but wasn't this about substituting
-          things? So lets look at the substitute: $1"Not
-          Your Business!"$2 is easy to read: The text remembered as $1, followed by "Not Your
-          Business!" (including the quotation marks!), followed by
-          the text remembered as $2. This produces
-          an exact copy of the original string, with the middle part (the
-          "document.referrer") replaced by "Not Your Business!".
-        
-        
-          The whole job now reads: Replace "document.referrer" by "Not Your
-          Business!" wherever it appears inside a <script> tag.
-          Note that this job won't break JavaScript syntax, since both the
-          original and the replacement are syntactically valid string
-          objects. The script just won't have access to the referrer
-          information anymore.
-        
-        
-          We'll show you two other jobs from the JavaScript taming
-          department, but this time only point out the constructs of special
-          interest:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Following the header line and a comment, you see the job. Note that
+      it uses | as the delimiter instead of
+      /, because the pattern contains a forward
+      slash, which would otherwise have to be escaped by a backslash
+      (\).
+
+      Now, let's examine the pattern: it starts with the text <script.* enclosed in parentheses. Since the dot
+      matches any character, and * means:
+      "Match an arbitrary number of the element left of
+      myself", this matches "<script",
+      followed by any
+      text, i.e. it matches the whole page, from the start of the first
+      <script> tag.
+
+      That's more than we want, but the pattern continues: document\.referrer matches only the exact string
+      "document.referrer". The dot needed to be
+      escaped, i.e.
+      preceded by a backslash, to take away its special meaning as a joker,
+      and make it just a regular dot. So far, the meaning is: Match from the
+      start of the first <script> tag in a the page, up to, and
+      including, the text "document.referrer", if
+      both are present
+      in the page (and appear in that order).
+
+      But there's still more pattern to go. The next element, again
+      enclosed in parentheses, is .*</script>.
+      You already know what .* means, so the whole
+      pattern translates to: Match from the start of the first <script>
+      tag in a page to the end of the last <script> tag, provided that
+      the text "document.referrer" appears
+      somewhere in between.
+
+      This is still not the whole story, since we have ignored the options
+      and the parentheses: The portions of the page matched by sub-patterns
+      that are enclosed in parentheses, will be remembered and be available
+      through the variables $1, $2, ... in the
+      substitute. The U option switches to ungreedy
+      matching, which means that the first .* in the
+      pattern will only "eat up" all text in
+      between "<script" and the first occurrence of
+      "document.referrer", and that the second
+      .* will only span the text up to the
+      first
+      "</script>" tag. Furthermore, the
+      s option says that the match may span multiple
+      lines in the page, and the g option again
+      means that the substitution is global.
+
+      So, to summarize, the pattern means: Match all scripts that contain
+      the text "document.referrer". Remember the
+      parts of the script from (and including) the start tag up to (and
+      excluding) the string "document.referrer" as
+      $1, and the part following that string, up to
+      and including the closing tag, as $2.
+
+      Now the pattern is deciphered, but wasn't this about substituting
+      things? So lets look at the substitute: $1"Not Your
+      Business!"$2 is easy to read: The text remembered as $1, followed by "Not Your
+      Business!" (including the quotation marks!), followed by the
+      text remembered as $2. This produces an exact
+      copy of the original string, with the middle part (the "document.referrer") replaced by "Not Your Business!".
+
+      The whole job now reads: Replace "document.referrer" by "Not Your
+      Business!" wherever it appears inside a <script> tag. Note
+      that this job won't break JavaScript syntax, since both the original
+      and the replacement are syntactically valid string objects. The script
+      just won't have access to the referrer information anymore.
+
+      We'll show you two other jobs from the JavaScript taming department,
+      but this time only point out the constructs of special interest:
+
+      
+        
+          
-          
-        
+             # The status bar is for displaying link targets, not pointless blahblah
 #
 s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
 
-            
-
-        
-          \s stands for whitespace characters
-          (space, tab, newline, carriage return, form feed), so that \s* means: "zero or more
-          whitespace". The ? in .*? makes this matching of arbitrary text ungreedy.
-          (Note that the U option is not set). The
-          ['"] construct means: "a single or a double quote". Finally, \1 is a back-reference to the first
-          parenthesis just like $1 above, with the
-          difference that in the pattern, a backslash indicates a
-          back-reference, whereas in the substitute, it's the dollar.
-        
-        
-          So what does this job do? It replaces assignments of single- or
-          double-quoted strings to the "window.status" object with a dummy assignment
-          (using a variable name that is hopefully odd enough not to conflict
-          with real variables in scripts). Thus, it catches many cases where
-          e.g. pointless descriptions are displayed in the status bar instead
-          of the link target when you move your mouse over links.
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      \s stands for whitespace characters (space,
+      tab, newline, carriage return, form feed), so that \s* means: "zero or more
+      whitespace". The ? in .*? makes this matching of arbitrary text ungreedy.
+      (Note that the U option is not set). The
+      ['"] construct means: "a
+      single or a
+      double quote". Finally, \1 is a
+      back-reference to the first parenthesis just like $1 above, with the difference that in the pattern, a backslash
+      indicates a back-reference, whereas in the substitute, it's the
+      dollar.
+
+      So what does this job do? It replaces assignments of single- or
+      double-quoted strings to the "window.status"
+      object with a dummy assignment (using a variable name that is hopefully
+      odd enough not to conflict with real variables in scripts). Thus, it
+      catches many cases where e.g. pointless descriptions are displayed in
+      the status bar instead of the link target when you move your mouse over
+      links.
+
+      
+        
+          
-          
-        
+             # Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
 #
 s/(<body [^>]*)onunload(.*>)/$1never$2/iU
 
-            
-
-        
-          Including the OnUnload event binding in the HTML DOM was a
-          CRIME. When I
-          close a browser window, I want it to close and die. Basta. This job
-          replaces the "onunload" attribute in
-          "<body>" tags with the dummy word
-          never. Note that the i option makes the pattern matching
-          case-insensitive. Also note that ungreedy matching alone doesn't
-          always guarantee a minimal match: In the first parenthesis, we had
-          to use [^>]* instead of .* to prevent the match from exceeding the
-          <body> tag if it doesn't contain "OnUnload", but the page's content does.
-        
-        
-          The last example is from the fun department:
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Including the OnUnload event binding in the HTML DOM was a
+      CRIME. When I
+      close a browser window, I want it to close and die. Basta. This job
+      replaces the "onunload" attribute in
+      "<body>" tags with the dummy word
+      never. Note that the i option makes the pattern matching case-insensitive.
+      Also note that ungreedy matching alone doesn't always guarantee a
+      minimal match: In the first parenthesis, we had to use [^>]* instead of .* to
+      prevent the match from exceeding the <body> tag if it doesn't
+      contain "OnUnload", but the page's content
+      does.
+
+      The last example is from the fun department:
+
+      
+        
+          
-          
-        
+             FILTER: fun Fun text replacements
 
 # Spice the daily news:
 #
 s/microsoft(?!\.com)/MicroSuck/ig
 
-            
-
-        
-          Note the (?!\.com) part (a so-called
-          negative lookahead) in the job's pattern, which means: Don't match,
-          if the string ".com" appears directly
-          following "microsoft" in the page. This
-          prevents links to microsoft.com from being trashed, while still
-          replacing the word everywhere else.
-        
-        
-        
-        
-          
-            
+        
+      
-+          
+
+      Note the (?!\.com) part (a so-called
+      negative lookahead) in the job's pattern, which means: Don't match, if
+      the string ".com" appears directly following
+      "microsoft" in the page. This prevents links
+      to microsoft.com from being trashed, while still replacing the word
+      everywhere else.
+
+      
+        
+          
-          
-        
+             # Buzzword Bingo (example for extended regex syntax)
 #
 s* industry[ -]leading \
@@ -469,539 +404,454 @@ s* industry[ -]leading \
 *<font color="red"><b>BINGO!</b></font> \
 *igx
 
-            
-
-        
-          The x option in this job turns on extended
-          syntax, and allows for e.g. the liberal use of (non-interpreted!)
-          whitespace for nicer formatting.
-        
-        
-          You get the idea?
-        
-      
-      
-        
-          9.2. The Pre-defined Filters
-        
-        
-          The distribution default.filter file
-          contains a selection of pre-defined filters for your convenience:
-        
-        
-          
-            
-              js-annoyances
-            
-            
-              
-                The purpose of this filter is to get rid of particularly
-                annoying JavaScript abuse. To that end, it
-              
-              
-                
-                  
-                    replaces JavaScript references to the browser's referrer
-                    information with the string "Not Your Business!". This
-                    compliments the hide-referrer
-                    action on the content level.
-                  
-                
-                
-                  
-                    removes the bindings to the DOM's unload event which we feel has no
-                    right to exist and is responsible for most "exit consoles", i.e. nasty windows that
-                    pop up when you close another one.
-                  
-                
-                
-                  
-                    removes code that causes new windows to be opened with
-                    undesired properties, such as being full-screen,
-                    non-resizeable, without location, status or menu bar etc.
-                  
-                
-              
-
-              
-                Use with caution. This is an aggressive filter, and can break
-                sites that rely heavily on JavaScript.
-              
-            
-            
-              js-events
-            
-            
-              
-                This is a very radical measure. It removes virtually all
-                JavaScript event bindings, which means that scripts can not
-                react to user actions such as mouse movements or clicks,
-                window resizing etc, anymore. Use with caution!
-              
-              
-                We strongly
-                discourage using this filter as a default since it
-                breaks many legitimate scripts. It is meant for use only on
-                extra-nasty sites (should you really need to go there).
-              
-            
-            
-              html-annoyances
-            
-            
-              
-                This filter will undo many common instances of HTML based
-                abuse.
-              
-              
-                The BLINK and MARQUEE tags are neutralized (yeah baby!), and
-                browser windows will be created as resizeable (as of course
-                they should be!), and will have location, scroll and menu
-                bars -- even if specified otherwise.
-              
-            
-            
-              content-cookies
-            
-            
-              
-                Most cookies are set in the HTTP dialog, where they can be
-                intercepted by the crunch-incoming-cookies
-                and crunch-outgoing-cookies
-                actions. But web sites increasingly make use of HTML meta
-                tags and JavaScript to sneak cookies to the browser on the
-                content level.
-              
-              
-                This filter disables most HTML and JavaScript code that reads
-                or sets cookies. It cannot detect all clever uses of these
-                types of code, so it should not be relied on as an absolute
-                fix. Use it wherever you would also use the cookie crunch
-                actions.
-              
-            
-            
-              refresh
-              tags
-            
-            
-              
-                Disable any refresh tags if the interval is greater than nine
-                seconds (so that redirections done via refresh tags are not
-                destroyed). This is useful for dial-on-demand setups, or for
-                those who find this HTML feature annoying.
-              
-            
-            
-              unsolicited-popups
-            
-            
-              
-                This filter attempts to prevent only "unsolicited" pop-up windows from opening, yet
-                still allow pop-up windows that the user has explicitly
-                chosen to open. It was added in version 3.0.1, as an
-                improvement over earlier such filters.
-              
-              
-                Technical note: The filter works by redefining the
-                window.open JavaScript function to a dummy function, PrivoxyWindowOpen(), during the loading
-                and rendering phase of each HTML page access, and restoring
-                the function afterward.
-              
-              
-                This is recommended only for browsers that cannot perform
-                this function reliably themselves. And be aware that some
-                sites require such windows in order to function normally. Use
-                with caution.
-              
-            
-            
-              all-popups
-            
-            
-              
-                Attempt to prevent all pop-up windows from opening. Note
-                this should be used with even more discretion than the above,
-                since it is more likely to break some sites that require
-                pop-ups for normal usage. Use with caution.
-              
-            
-            
-              img-reorder
-            
-            
-              
-                This is a helper filter that has no value if used alone. It
-                makes the banners-by-size and banners-by-link (see below) filters more
-                effective and should be enabled together with them.
-              
-            
-            
-              banners-by-size
-            
-            
-              
-                This filter removes image tags purely based on what size they
-                are. Fortunately for us, many ads and banner images tend to
-                conform to certain standardized sizes, which makes this
-                filter quite effective for ad stripping purposes.
-              
-              
-                Occasionally this filter will cause false positives on images
-                that are not ads, but just happen to be of one of the
-                standard banner sizes.
-              
-              
-                Recommended only for those who require extreme ad blocking.
-                The default block rules should catch 95+% of all ads without this
-                filter enabled.
-              
-            
-            
-              banners-by-link
-            
-            
-              
-                This is an experimental filter that attempts to kill any
-                banners if their URLs seem to point to known or suspected
-                click trackers. It is currently not of much value and is not
-                recommended for use by default.
-              
-            
-            
-              webbugs
-            
-            
-              
-                Webbugs are small, invisible images (technically 1X1 GIF
-                images), that are used to track users across websites, and
-                collect information on them. As an HTML page is loaded by the
-                browser, an embedded image tag causes the browser to contact
-                a third-party site, disclosing the tracking information
-                through the requested URL and/or cookies for that third-party
-                domain, without the user ever becoming aware of the
-                interaction with the third-party site. HTML-ized spam also
-                uses a similar technique to verify email addresses.
-              
-              
-                This filter removes the HTML code that loads such "webbugs".
-              
-            
-            
-              tiny-textforms
-            
-            
-              
-                A rather special-purpose filter that can be used to enlarge
-                textareas (those multi-line text boxes in web forms) and turn
-                off hard word wrap in them. It was written for the
-                sourceforge.net tracker system where such boxes are a
-                nuisance, but it can be handy on other sites, too.
-              
-              
-                It is not recommended to use this filter as a default.
-              
-            
-            
-              jumping-windows
-            
-            
-              
-                Many consider windows that move, or resize themselves to be
-                abusive. This filter neutralizes the related JavaScript code.
-                Note that some sites might not display or behave as intended
-                when using this filter. Use with caution.
-              
-            
-            
-              frameset-borders
-            
-            
-              
-                Some web designers seem to assume that everyone in the world
-                will view their web sites using the same browser brand and
-                version, screen resolution etc, because only that assumption
-                could explain why they'd use static frame sizes, yet prevent
-                their frames from being resized by the user, should they be
-                too small to show their whole content.
-              
-              
-                This filter removes the related HTML code. It should only be
-                applied to sites which need it.
-              
-            
-            
-              demoronizer
-            
-            
-              
-                Many Microsoft products that generate HTML use non-standard
-                extensions (read: violations) of the ISO 8859-1 aka Latin-1
-                character set. This can cause those HTML documents to display
-                with errors on standard-compliant platforms.
-              
-              
-                This filter translates the MS-only characters into Latin-1
-                equivalents. It is not necessary when using MS products, and
-                will cause corruption of all documents that use 8-bit
-                character sets other than Latin-1. It's mostly worthwhile for
-                Europeans on non-MS platforms, if weird garbage characters
-                sometimes appear on some pages, or user agents that don't
-                correct for this on the fly.
-              
-            
-            
-              shockwave-flash
-            
-            
-              
-                A filter for shockwave haters. As the name suggests, this
-                filter strips code out of web pages that is used to embed
-                shockwave flash objects.
-              
-              
-              
-            
-            
-              quicktime-kioskmode
-            
-            
-              
-                Change HTML code that embeds Quicktime objects so that
-                kioskmode, which prevents saving, is disabled.
-              
-            
-            
-              fun
-            
-            
-              
-                Text replacements for subversive browsing fun. Make fun of
-                your favorite Monopolist or play buzzword bingo.
-              
-            
-            
-              crude-parental
-            
-            
-              
-                A demonstration-only filter that shows how Privoxy can be used to delete web
-                content on a keyword basis.
-              
-            
-            
-              ie-exploits
-            
-            
-              
-                An experimental collection of text replacements to disable
-                malicious HTML and JavaScript code that exploits known
-                security holes in Internet Explorer.
-              
-              
-                Presently, it only protects against Nimda and a cross-site
-                scripting bug, and would need active maintenance to provide
-                more substantial protection.
-              
-            
-            
-              site-specifics
-            
-            
-              
-                Some web sites have very specific problems, the cure for
-                which doesn't apply anywhere else, or could even cause damage
-                on other sites.
-              
-              
-                This is a collection of such site-specific cures which should
-                only be applied to the sites they were intended for, which is
-                what the supplied default.action
-                file does. Users shouldn't need to change anything regarding
-                this filter.
-              
-            
-            
-              google
-            
-            
-              
-                A CSS based block for Google text ads. Also removes a width
-                limitation and the toolbar advertisement.
-              
-            
-            
-              yahoo
-            
-            
-              
-                Another CSS based block, this time for Yahoo text ads. And
-                removes a width limitation as well.
-              
-            
-            
-              msn
-            
-            
-              
-                Another CSS based block, this time for MSN text ads. And
-                removes tracking URLs, as well as a width limitation.
-              
-            
-            
-              blogspot
-            
-            
-              
-                Cleans up some Blogspot blogs. Read the fine print before
-                using this one!
-              
-              
-                This filter also intentionally removes some navigation stuff
-                and sets the page width to 100%. As a result, some rounded
-                "corners" would appear to early or
-                not at all and as fixing this would require a browser that
-                understands background-size (CSS3), they are removed instead.
-              
-            
-            
-              xml-to-html
-            
-            
-              
-                Server-header filter to change the Content-Type from xml to
-                html.
-              
-            
-            
-              html-to-xml
-            
-            
-              
-                Server-header filter to change the Content-Type from html to
-                xml.
-              
-            
-            
-              no-ping
-            
-            
-              
-                Removes the non-standard ping
-                attribute from anchor and area HTML tags.
-              
-            
-            
-              hide-tor-exit-notation
-            
-            
-              
-                Client-header filter to remove the Tor
-                exit node notation found in Host and Referer headers.
-              
-              
-                If Privoxy and Tor are chained and Privoxy is configured to use socks4a,
-                one can use "http://www.example.org.foobar.exit/" to
-                access the host "www.example.org"
-                through the Tor exit node "foobar".
-              
-              
-                As the HTTP client isn't aware of this notation, it treats
-                the whole string "www.example.org.foobar.exit" as host and uses
-                it for the "Host" and "Referer" headers. From the server's point of
-                view the resulting headers are invalid and can cause
-                problems.
-              
-              
-                An invalid "Referer" header can
-                trigger "hot-linking" protections,
-                an invalid "Host" header will make
-                it impossible for the server to find the right vhost (several
-                domains hosted on the same IP address).
-              
-              
-                This client-header filter removes the "foo.exit" part in those headers to prevent
-                the mentioned problems. Note that it only modifies the HTTP
-                headers, it doesn't make it impossible for the server to
-                detect your Tor exit node based on the
-                IP address the request is coming from.
-              
-            
-          
-        
-      
-    
-    
-      
-      
-        
-          
-          
-          
-        
-        
-          
-          
-          
-            Prev
-          
-            Home
-          
-            Next
-          

-            Actions Files
-          
-             
-          
-            Privoxy's Template Files
           
         
       
+
+      The x option in this job turns on extended
+      syntax, and allows for e.g. the liberal use of (non-interpreted!)
+      whitespace for nicer formatting.
+
+      You get the idea?
     
-  
-
 
+    
+      9.2. The Pre-defined Filters
+
+      The distribution default.filter file
+      contains a selection of pre-defined filters for your convenience:
+
+      
+        
+          js-annoyances
+
+          
+            The purpose of this filter is to get rid of particularly
+            annoying JavaScript abuse. To that end, it
+
+            
+              
+                replaces JavaScript references to the browser's referrer
+                information with the string "Not Your Business!". This
+                compliments the hide-referrer
+                action on the content level.
+              
+
+              
+                removes the bindings to the DOM's unload event which we feel has no right to
+                exist and is responsible for most "exit
+                consoles", i.e. nasty windows that pop up when you
+                close another one.
+              
+
+              
+                removes code that causes new windows to be opened with
+                undesired properties, such as being full-screen,
+                non-resizeable, without location, status or menu bar etc.
+              
+            
+
+            Use with caution. This is an aggressive filter, and can break
+            sites that rely heavily on JavaScript.
+          
+
+          js-events
+
+          
+            This is a very radical measure. It removes virtually all
+            JavaScript event bindings, which means that scripts can not react
+            to user actions such as mouse movements or clicks, window
+            resizing etc, anymore. Use with caution!
+
+            We strongly
+            discourage using this filter as a default since it
+            breaks many legitimate scripts. It is meant for use only on
+            extra-nasty sites (should you really need to go there).
+          
+
+          html-annoyances
+
+          
+            This filter will undo many common instances of HTML based
+            abuse.
+
+            The BLINK and MARQUEE tags are neutralized (yeah baby!), and
+            browser windows will be created as resizeable (as of course they
+            should be!), and will have location, scroll and menu bars -- even
+            if specified otherwise.
+          
+
+          content-cookies
+
+          
+            Most cookies are set in the HTTP dialog, where they can be
+            intercepted by the crunch-incoming-cookies
+            and crunch-outgoing-cookies
+            actions. But web sites increasingly make use of HTML meta tags
+            and JavaScript to sneak cookies to the browser on the content
+            level.
+
+            This filter disables most HTML and JavaScript code that reads
+            or sets cookies. It cannot detect all clever uses of these types
+            of code, so it should not be relied on as an absolute fix. Use it
+            wherever you would also use the cookie crunch actions.
+          
+
+          refresh-tags
+
+          
+            Disable any refresh tags if the interval is greater than nine
+            seconds (so that redirections done via refresh tags are not
+            destroyed). This is useful for dial-on-demand setups, or for
+            those who find this HTML feature annoying.
+          
+
+          unsolicited-popups
+
+          
+            This filter attempts to prevent only "unsolicited" pop-up windows from opening, yet
+            still allow pop-up windows that the user has explicitly chosen to
+            open. It was added in version 3.0.1, as an improvement over
+            earlier such filters.
+
+            Technical note: The filter works by redefining the window.open
+            JavaScript function to a dummy function, PrivoxyWindowOpen(), during the loading and
+            rendering phase of each HTML page access, and restoring the
+            function afterward.
+
+            This is recommended only for browsers that cannot perform this
+            function reliably themselves. And be aware that some sites
+            require such windows in order to function normally. Use with
+            caution.
+          
+
+          all-popups
+
+          
+            Attempt to prevent all pop-up windows from opening. Note this
+            should be used with even more discretion than the above, since it
+            is more likely to break some sites that require pop-ups for
+            normal usage. Use with caution.
+          
+
+          img-reorder
+
+          
+            This is a helper filter that has no value if used alone. It
+            makes the banners-by-size and banners-by-link (see below) filters more effective
+            and should be enabled together with them.
+          
+
+          banners-by-size
+
+          
+            This filter removes image tags purely based on what size they
+            are. Fortunately for us, many ads and banner images tend to
+            conform to certain standardized sizes, which makes this filter
+            quite effective for ad stripping purposes.
+
+            Occasionally this filter will cause false positives on images
+            that are not ads, but just happen to be of one of the standard
+            banner sizes.
+
+            Recommended only for those who require extreme ad blocking.
+            The default block rules should catch 95+% of all ads without this filter
+            enabled.
+          
+
+          banners-by-link
+
+          
+            This is an experimental filter that attempts to kill any
+            banners if their URLs seem to point to known or suspected click
+            trackers. It is currently not of much value and is not
+            recommended for use by default.
+          
+
+          webbugs
+
+          
+            Webbugs are small, invisible images (technically 1X1 GIF
+            images), that are used to track users across websites, and
+            collect information on them. As an HTML page is loaded by the
+            browser, an embedded image tag causes the browser to contact a
+            third-party site, disclosing the tracking information through the
+            requested URL and/or cookies for that third-party domain, without
+            the user ever becoming aware of the interaction with the
+            third-party site. HTML-ized spam also uses a similar technique to
+            verify email addresses.
+
+            This filter removes the HTML code that loads such "webbugs".
+          
+
+          tiny-textforms
+
+          
+            A rather special-purpose filter that can be used to enlarge
+            textareas (those multi-line text boxes in web forms) and turn off
+            hard word wrap in them. It was written for the sourceforge.net
+            tracker system where such boxes are a nuisance, but it can be
+            handy on other sites, too.
+
+            It is not recommended to use this filter as a default.
+          
+
+          jumping-windows
+
+          
+            Many consider windows that move, or resize themselves to be
+            abusive. This filter neutralizes the related JavaScript code.
+            Note that some sites might not display or behave as intended when
+            using this filter. Use with caution.
+          
+
+          frameset-borders
+
+          
+            Some web designers seem to assume that everyone in the world
+            will view their web sites using the same browser brand and
+            version, screen resolution etc, because only that assumption
+            could explain why they'd use static frame sizes, yet prevent
+            their frames from being resized by the user, should they be too
+            small to show their whole content.
+
+            This filter removes the related HTML code. It should only be
+            applied to sites which need it.
+          
+
+          demoronizer
+
+          
+            Many Microsoft products that generate HTML use non-standard
+            extensions (read: violations) of the ISO 8859-1 aka Latin-1
+            character set. This can cause those HTML documents to display
+            with errors on standard-compliant platforms.
+
+            This filter translates the MS-only characters into Latin-1
+            equivalents. It is not necessary when using MS products, and will
+            cause corruption of all documents that use 8-bit character sets
+            other than Latin-1. It's mostly worthwhile for Europeans on
+            non-MS platforms, if weird garbage characters sometimes appear on
+            some pages, or user agents that don't correct for this on the
+            fly.
+          
+
+          shockwave-flash
+
+          
+            A filter for shockwave haters. As the name suggests, this
+            filter strips code out of web pages that is used to embed
+            shockwave flash objects.
+          
+
+          quicktime-kioskmode
+
+          
+            Change HTML code that embeds Quicktime objects so that
+            kioskmode, which prevents saving, is disabled.
+          
+
+          fun
+
+          
+            Text replacements for subversive browsing fun. Make fun of
+            your favorite Monopolist or play buzzword bingo.
+          
+
+          crude-parental
+
+          
+            A demonstration-only filter that shows how Privoxy can be used to delete web content on
+            a keyword basis.
+          
+
+          ie-exploits
+
+          
+            An experimental collection of text replacements to disable
+            malicious HTML and JavaScript code that exploits known security
+            holes in Internet Explorer.
+
+            Presently, it only protects against Nimda and a cross-site
+            scripting bug, and would need active maintenance to provide more
+            substantial protection.
+          
+
+          site-specifics
+
+          
+            Some web sites have very specific problems, the cure for which
+            doesn't apply anywhere else, or could even cause damage on other
+            sites.
+
+            This is a collection of such site-specific cures which should
+            only be applied to the sites they were intended for, which is
+            what the supplied default.action file
+            does. Users shouldn't need to change anything regarding this
+            filter.
+          
+
+          google
+
+          
+            A CSS based block for Google text ads. Also removes a width
+            limitation and the toolbar advertisement.
+          
+
+          yahoo
+
+          
+            Another CSS based block, this time for Yahoo text ads. And
+            removes a width limitation as well.
+          
+
+          msn
+
+          
+            Another CSS based block, this time for MSN text ads. And
+            removes tracking URLs, as well as a width limitation.
+          
+
+          blogspot
+
+          
+            Cleans up some Blogspot blogs. Read the fine print before
+            using this one!
+
+            This filter also intentionally removes some navigation stuff
+            and sets the page width to 100%. As a result, some rounded
+            "corners" would appear to early or not
+            at all and as fixing this would require a browser that
+            understands background-size (CSS3), they are removed instead.
+          
+
+          xml-to-html
+
+          
+            Server-header filter to change the Content-Type from xml to
+            html.
+          
+
+          html-to-xml
+
+          
+            Server-header filter to change the Content-Type from html to
+            xml.
+          
+
+          no-ping
+
+          
+            Removes the non-standard ping
+            attribute from anchor and area HTML tags.
+          
+
+          hide-tor-exit-notation
+
+          
+            Client-header filter to remove the Tor
+            exit node notation found in Host and Referer headers.
+
+            If Privoxy and Tor are chained and Privoxy is configured to use socks4a, one
+            can use "http://www.example.org.foobar.exit/" to access
+            the host "www.example.org" through the
+            Tor exit node "foobar".
+
+            As the HTTP client isn't aware of this notation, it treats the
+            whole string "www.example.org.foobar.exit" as host and uses it
+            for the "Host" and "Referer" headers. From the server's point of view
+            the resulting headers are invalid and can cause problems.
+
+            An invalid "Referer" header can
+            trigger "hot-linking" protections, an
+            invalid "Host" header will make it
+            impossible for the server to find the right vhost (several
+            domains hosted on the same IP address).
+
+            This client-header filter removes the "foo.exit" part in those headers to prevent the
+            mentioned problems. Note that it only modifies the HTTP headers,
+            it doesn't make it impossible for the server to detect your
+            Tor exit node based on the IP address the
+            request is coming from.
+          
+        
+      
+    
+  
+
+  
+    
+
+    
+      
+        
+
+        
+
+        
+      
+
+      
+        
+
+        
+
+        
+      
+    Prev Home Next
Actions Files   Privoxy's Template
+        Files
+  
+
+

9. Filter + Files

9.1. Filter File + Tutorial

- 9. Filter Files -

- 9.1. Filter File Tutorial -

- 9.2. The Pre-defined Filters -

9.2. The Pre-defined Filters