X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Fappendix.html;h=8cc189f0cda12bbfff0fba6cb47e36b17512362b;hp=0546aff137fb9c23d704eb655dd01de5b0526149;hb=72081f829de368392d04076728f8c991178c0080;hpb=16e9ef297b4cf15a61876abcc794e5a058500e4b diff --git a/doc/webserver/user-manual/appendix.html b/doc/webserver/user-manual/appendix.html index 0546aff1..8cc189f0 100644 --- a/doc/webserver/user-manual/appendix.html +++ b/doc/webserver/user-manual/appendix.html @@ -1,12 +1,13 @@ +
Privoxy User Manual | Privoxy 3.0.3 User Manual||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Prev | 9. Appendix14. Appendix 9.1. Regular Expressions14.1. Regular Expressions Privoxy can use uses Perl-style "regular expressions" - in various config files. Assuming support for "pcre" (Perl - Compatible Regular Expressions) is compiled in, which is the default. Such - configuration directives do not require regular expressions, but they can be - used to increase flexibility by matching a pattern with wild-cards against - URLs. "regular + expressions" in its actions + files and filter file, + through the PCRE and + PCRS libraries.If you are reading this, you probably don't understand what "regular expressions" are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) book ;-)Regular expressions provide a language to describe patterns that can be + run against strings of characters (letter, numbers, etc), to see if they + match the string or not. The patterns are themselves (sometimes complex) + strings of literal characters, combined with wild-cards, and other special + characters, called meta-characters. The "Regular expressions" is a way of matching one character - expression against another to see if it matches or not. One of the +>"meta-characters" have + special meanings and are used to build complex patterns to be matched against. + Perl Compatible Regular Expressions are an especially convenient "expressions" is a literal string of readable characters - (letter, numbers, etc), and the other is a complex string of literal - characters combined with wild-cards, and other special characters, called - meta-characters. The "meta-characters" have special meanings and - are used to build the complex pattern to be matched against. Perl Compatible - Regular Expressions is an enhanced form of the regular expression language - with backward compatibility. "dialect" of the regular expression language. To make a simple analogy, we do something similar when we use wild-card
characters when listing files with the dir command in DOS.
- *.**.* matches all filenames. The "special"
character here is the asterisk which matches any and all characters. We can be
- more specific and use ?? to match just individual
characters. So
These are just some of the ones you are likely to use when matching URLs with
/.*/banners/.*/.*/banners/.* - A simple example
that uses the common combination of A now something a little more complex: /.*/adv((er)?ts?|ertis(ing|ements?))?//.*/adv((er)?ts?|ertis(ing|ements?))?/ -
We have several literal forward slashes again (".*", so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that must
match /.*/advert[0-9]+\.(gif|jpe?g)/.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward slashes. Anything in the square brackets is not in the expression anywhere). s/microsoft(?!.com)/MicroSuck/i - This is - a substitution. "MicroSuck" will replace any occurrence of - "microsoft". The "i" at the end of the expression - means ignore case. The "(?!.com)" means - the match should fail if "microsoft" is followed by - ".com". In other words, this acts like a "NOT" - modifier. In case this is a hyperlink, we don't want to break it ;-). We are barely scratching the surface of regular expressions here so that you can understand the default http://www.perldoc.com/perl5.6/pod/perlre.html For information on regular expression based substitutions and their applications + in filters, please see the filter file tutorial + in this manual. 9.2. 14.2. Privoxy's Internal PagesAlternately, this may be reached at There is a shortcut: http://p.p/, but this - variation may not work as reliably as the above in some configurations. +> (But it + doesn't provide a fall-back to a real page, in case the request is not + sent through Privoxy) - Show information about the current configuration: + Show information about the current configuration, including viewing and + editing of actions files:
|
Notice the only difference here to the previous listing, is to + "fast-redirects" and "session-cookies-only".
Now another example, "ad.doubleclick.net"
{ +block +image } +> { +block +handle-as-image } .ad.doubleclick.net - { +block +image } + { +block +handle-as-image } ad*. - { +block +image } - .doubleclick.net - -
Any one of these would have done the trick and blocked this as an unwanted @@ -1309,27 +1693,38 @@ CLASS="QUOTE" CLASS="QUOTE" >"ad.doubleclick.net"
- is done here -- as both a "+block" + and an - an + "+image". The custom alias "+handle-as-image". + The custom alias "+imageblock" does this - for us. just simplifies the process and make + it more readable.One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/". - This one is giving us problems. We are getting a blank page. Hmmm...
Now the page displays ;-) Be sure to flush your browser's caches when - making such changes. Or, try using Shift+ReloadShift+Reload.
But now what about a situation where we get no explicit matches like @@ -1411,10 +1835,8 @@ WIDTH="100%" >
{ -block } - /adsl - -{ +block +handle-as-image } + /ads
This would turn off all filtering for that site. This would probably be most + appropriately put in user.action, for local site + exceptions.
Images that are inexplicably being blocked, may well be hitting the + "+filter{banners-by-size}" rule, which assumes + that images of certain sizes are ad banners (works well most of the time + since these tend to be standardized).
"{fragile}"
Home |