X-Git-Url: http://www.privoxy.org/gitweb/?a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Fappendix.html;h=b02a2549f3a131005eeb99b706c559e3ea9f9e91;hb=00ff6723cacb0c08cbf3f1044e8639a89ebc23d7;hp=0546aff137fb9c23d704eb655dd01de5b0526149;hpb=16e9ef297b4cf15a61876abcc794e5a058500e4b;p=privoxy.git diff --git a/doc/webserver/user-manual/appendix.html b/doc/webserver/user-manual/appendix.html index 0546aff1..b02a2549 100644 --- a/doc/webserver/user-manual/appendix.html +++ b/doc/webserver/user-manual/appendix.html @@ -4,16 +4,19 @@ >Appendix + +
Privoxy User Manual | Privoxy 3.0.4 User Manual||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Prev | 9. Appendix14. Appendix9.1. Regular Expressions14.1. Regular ExpressionsPrivoxy can use "regular expressions" - in various config files. Assuming support for uses Perl-style "pcre" (Perl - Compatible Regular Expressions) is compiled in, which is the default. Such - configuration directives do not require regular expressions, but they can be - used to increase flexibility by matching a pattern with wild-cards against - URLs. "regular + expressions" in its actions + files and filter file, + through the PCRE and + PCRS libraries.If you are reading this, you probably don't understand what "regular expressions" are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) book ;-)Regular expressions provide a language to describe patterns that can be + run against strings of characters (letter, numbers, etc), to see if they + match the string or not. The patterns are themselves (sometimes complex) + strings of literal characters, combined with wild-cards, and other special + characters, called meta-characters. The "Regular expressions" is a way of matching one character - expression against another to see if it matches or not. One of the +>"meta-characters" have + special meanings and are used to build complex patterns to be matched against. + Perl Compatible Regular Expressions are an especially convenient "expressions" is a literal string of readable characters - (letter, numbers, etc), and the other is a complex string of literal - characters combined with wild-cards, and other special characters, called - meta-characters. The "meta-characters" have special meanings and - are used to build the complex pattern to be matched against. Perl Compatible - Regular Expressions is an enhanced form of the regular expression language - with backward compatibility. "dialect" of the regular expression language.To make a simple analogy, we do something similar when we use wild-card characters when listing files with the
These are just some of the ones you are likely to use when matching URLs with
/.*/banners/.* - A simple example
that uses the common combination of in the path
somewhere. A now something a little more complex: /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal forward slashes again (".*", so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must match our pattern is adv"ing" - OR "ements?", which would then match either spelling. /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward slashes. Anything in the square brackets "[]""[ ]" can be matched. This is using "0-9" is not in the expression anywhere). s/microsoft(?!.com)/MicroSuck/i - This is - a substitution. "MicroSuck" will replace any occurrence of - "microsoft". The "i" at the end of the expression - means ignore case. The "(?!.com)" means - the match should fail if "microsoft" is followed by - ".com". In other words, this acts like a "NOT" - modifier. In case this is a hyperlink, we don't want to break it ;-). We are barely scratching the surface of regular expressions here so that you
can understand the default More reading on Perl Compatible Regular expressions:
http://www.perldoc.com/perl5.6/pod/perlre.htmlhttp://perldoc.perl.org/perlre.html For information on regular expression based substitutions and their applications
+ in filters, please see the filter file tutorial
+ in this manual. 9.2. 14.2. Privoxy's Internal Pages's Internal Pages Since Alternately, this may be reached at There is a shortcut: http://p.p/, but this
- variation may not work as reliably as the above in some configurations.
+> (But it
+ doesn't provide a fall-back to a real page, in case the request is not
+ sent through Privoxy)
- Show information about the current configuration:
+ Show information about the current configuration, including viewing and
+ editing of actions files:
- Show the client's request headers:
+ Show the browser's request headers:
- Edit the actions list file: - These may be bookmarked for quick reference. These may be bookmarked for quick reference. See next.9.2.1. Bookmarklets14.2.1. BookmarkletsHere are some bookmarklets to allow you to easily access a +> Below are some "bookmarklets" to allow you to easily access a "mini" version of this page. They are designed for MS Internet - Explorer, but should work equally well in Netscape, Mozilla, and other - browsers which support JavaScript. They are designed to run directly from - your bookmarks - not by clicking the links below (although that will work for - testing). version of some of Privoxy's + special pages. They are designed for MS Internet Explorer, but should work + equally well in Netscape, Mozilla, and other browsers which support + JavaScript. They are designed to run directly from your bookmarks - not by + clicking the links below (although that should work for testing).To save them, right-click the link and choose "may not be safe" - just click OK. Then you can run the - Bookmarklet directly from your favourites/bookmarks. For even faster access, + Bookmarklet directly from your favorites/bookmarks. For even faster access, you can put them on the "Links" Enable PrivoxyPrivoxy - Enable Disable PrivoxyPrivoxy - Disable Toggle PrivoxyPrivoxy - Toggle Privoxy (Toggles between enabled and disabled) View Privoxy StatusPrivoxy- View Status +Credit: The site which gave me the general idea for these bookmarklets is +> Credit: The site which gave us the general idea for these bookmarklets is www.bookmarklets.com. They @@ -1028,25 +1072,256 @@ CLASS="SECT2" > 9.3. Anatomy of an Action14.3. Chain of EventsThe way Let's take a quick look at the basic sequence of events when a web page is + requested by your browser and Privoxy applies is on duty:
14.4. Anatomy of an ActionThe way Privoxy applies + actions and filters + to any given URL can be complex, and not always so easy to understand what is happening. And sometimes we need to be able to - see just what PrivoxyPrivoxy is doing - is causing us a problem inadvertantly. It can be a little daunting to look at + is causing us a problem inadvertently. It can be a little daunting to look at the actions and filters files themselves, since they tend to be filled with - "regular expressions" whose consequences are not always - so obvious. regular expressions whose consequences are not + always so obvious. One quick test to see if Privoxy is causing a problem + or not, is to disable it temporarily. This should be the first troubleshooting + step. See the Bookmarklets section on a quick + and easy way to do this (be sure to flush caches afterward!). Looking at the + logs is a good idea too. Privoxy provides the +> also provides the actions - are being applied to any given URL. This is a big help for troubleshooting. - First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell us how the current configuration will handle it. This will not - help with filtering effects from the default.filter file! It - also will not tell you about any other URLs that may be embedded within the - URL you are testing. For instance, images such as ads are expressed as URLs - within the raw page source of HTML pages. So you will only get info for the - actual URL that is pasted into the prompt area -- not any sub-URLs. If you - want to know about embedded URLs like ads, you will have to dig those out of - the HTML source. Use your browser's "View Page Source" option - for this. Or right click on the ad, and grab the URL. Let's look at an example, "+filter" action) from + one of the filter files since this is handled very + differently and not so easy to trap! It also will not tell you about any other + URLs that may be embedded within the URL you are testing. For instance, images + such as ads are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the prompt area + -- not any sub-URLs. If you want to know about embedded URLs like ads, you + will have to dig those out of the HTML source. Use your browser's "View + Page Source" option for this. Or right click on the ad, and grab the + URL. Let's try an example, google.com, - one section at a time:
This is the top section, and only tells us of the compiled in defaults. This - is basically what Privoxy would do if there - were not any "actions" defined, i.e. it does nothing. Every action - is disabled. This is not particularly informative for our purposes here. OK, - next section:
This is much more informative, and tells us how we have defined our - This is telling us how we have defined our + "actions", and which ones match for our example, - , and + which ones match for our test case, "google.com". The first grouping shows our default - settings, which would apply to all URLs. If you look at your . + Displayed is all the actions that are available to us. Remember, + the + sign denotes "actions""on". - - file, this would be the section just below the "off". So some are "on" here, but many + are "off". Each example we try may provide a slightly different + end result, depending on our configuration directives. The first listing + is any matches for the standard.action file. No hits at + all here on "standard". Then next is "default", or + our default.action file. The large, multi-line listing, + is how the actions are set to match for all URLs, i.e. our default settings. + If you look at your "actions" file, this would be the section + just below the "aliases" section - near the top. This applies to all URLs as signified by the single forward - slash -- section near the top. This will apply to + all URLs as signified by the single forward slash at the end of the listing + -- "/". - .These are the default actions we have enabled. But we can define additional - actions that would be exceptions to these general rules, and then list - specific URLs that these exceptions would apply to. Last match wins. - Just below this then are two explict matches for But we can define additional actions that would be exceptions to these general + rules, and then list specific URLs (or patterns) that these exceptions would + apply to. Last match wins. Just below this then are two explicit matches for + ".google.com". - The first is negating our various cookie blocking actions (i.e. we will allow - cookies here). The second is allowing . The first is negating our previous cookie setting, + which was for "fast-redirects". Note - that there is a leading dot here -- "+session-cookies-only" + (i.e. not persistent). So we will allow persistent cookies for google, at + least that is how it is in this example. The second turns + off any + "+fast-redirects" + action, allowing this to take place unmolested. Note that there is a leading + dot here -- ".google.com". This will - match any hosts and sub-domains, in the google.com domain also, such as +>. This will match any hosts and + sub-domains, in the google.com domain also, such as "www.google.com". So, apparently, we have these actions defined - somewhere in the lower part of our actions file, and - . So, apparently, we have these two actions + defined somewhere in the lower part of our default.action + file, and "google.com" is referenced in these sections. is referenced somewhere in these latter + sections.Then, for our user.action file, we again have no hits. + So there is nothing google-specific that we might have added to our own, local + configuration. And now we pull it altogether in the bottom section and summarize how
+> And finally we pull it all together in the bottom section and summarize how
Privoxy is appying all its is applying all its "actions"
@@ -1242,21 +1639,69 @@ WIDTH="100%"
> |
Notice the only difference here to the previous listing, is to + "fast-redirects" and "session-cookies-only", + which are activated specifically for this site in our configuration, + and thus show in the "Final Results".
Now another example, "ad.doubleclick.net"
{ +block +image } +> { +block +handle-as-image } .ad.doubleclick.net - { +block +image } + { +block +handle-as-image } ad*. - { +block +image } - .doubleclick.net - -
Any one of these would have done the trick and blocked this as an unwanted @@ -1309,27 +1755,38 @@ CLASS="QUOTE" CLASS="QUOTE" >"ad.doubleclick.net"
- is done here -- as both a "+block" + and an - an + "+image". The custom alias "+handle-as-image". + The custom alias "+imageblock" does this - for us. just simplifies the process and make + it more readable.One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/""http://www.example.net/adsl/HOWTO/"
. - This one is giving us problems. We are getting a blank page. Hmmm...
Matches for http://www.rhapsodyk.net/adsl/HOWTO/: +> Matches for http://www.example.net/adsl/HOWTO/: - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } - / + In file: default.action [ View ] [ Edit ] - { +block +image } - /ads + {-add-header + -block + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs + -downgrade-http-version + +fast-redirects{check-decoded-url} + +filter{html-annoyances} + +filter{js-annoyances} + +filter{kill-popups} + +filter{webbugs} + +filter{nimda} + +filter{banners-by-size} + +filter{hal} + +filter{fun} + -filter-client-headers + -filter-server-headers + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-forwarded-for-headers + +hide-from-header{block} + +hide-referer{forge} + -hide-user-agent + -inspect-jpegs + +kill-popups + -overwrite-last-modified + +prevent-compression + -redirect + -send-vanilla-wafer + -send-wafer + +session-cookies-only + +set-image-blocker{blank} + -treat-forbidden-connects-like-blocks } + / - |
{ -block } - /adsl - -{ +block +handle-as-image } + /ads
This would turn off all filtering for that site. This would probably be most + appropriately put in user.action, for local site + exceptions.
Images that are inexplicably being blocked, may well be hitting the + "+filter{banners-by-size}" rule, which assumes + that images of certain sizes are ad banners (works well most of the time + since these tend to be standardized).
"{fragile}"
Home |