X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=a07188ad234f40cc383ec53efa7498b8d54214c1;hp=5266cd4bdf4837299ed6300ebac5292642cd1f14;hb=f8e6e33a77893bf4f7c70353935c7d801f63438a;hpb=d2630aab3cf54d864d02d331b470b39a083fb90d
diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml
index 5266cd4b..a07188ad 100644
--- a/doc/source/user-manual.sgml
+++ b/doc/source/user-manual.sgml
@@ -9,13 +9,15 @@
+
-
-
+
+
+
-
-
+
+
@@ -28,15 +30,11 @@
Privoxy">
]>
- Copyright &my-copy; 2001-2011 by
- Troubleshooting: Anatomy of an
- Action
has hints on how to understand and debug actions that
- misbehave
.
- problem
sites, and to spend more time adjusting the
- configuration to solve these unintended consequences. In short, there is
- not an easy way to eliminate actions
. Actions
in this context, are
- the directives we use to tell action
. Each
- action has a unique name and function. While there are many potential
- broken image
icon). There are some limitations to this
- though. For instance, you can't just brute-force an image substitution for
- an entire HTML page in most situations.
- invisible
configuration option.
- block
blocks a site, page, or unwanted contented. Filters
- are a way of filtering or modifying what is actually on the page. An example
- filter usage: a text replacement of no-no
for
- nasty-word
. That is a very simple example. This process can be
- used for ad blocking, but it is more in the realm of advanced usage and has
- some pitfalls to be wary off.
-actions
file, and click
-
. It is best to put personal or
- local preferences in actions
, and URLs for ad
- blocking or other purposes, and make other adjustments to the configuration.
-
from the
- pop-up menu.
-
:
- Actions:
.
- If not, click a
- button, and in the new section that just appeared, click the
- Actions:
.
- This will bring up a list of all actions. Find
- Enabled
column, then
- just below the list.
-
button, and paste the URL the
- browser got from
.
- Remove the
(or
-
if in a pop-up window).
- patterns
, and
- the entire actions concept, see the Actions
- section.
-advanced
usage category, and are explained in
- depth in later sections.
-Use Proxy
and fill in the appropriate info
- (Address: 127.0.0.1, Port: 8118). Include HTTPS (SSL), if you want HTTPS
- proxy support too (sometimes labeled Secure
). Make sure any
- checkboxes like Use the same proxy server for all protocols
is
- config
in the current directory (except on Win32
- where it will look for config.txt
instead). Specify
- full path to avoid confusion. If no config file is found,
- Toggle Privoxy On or Off
is handy for sites that might
- have problems with your current actions and filters. You can in fact use
- it as a test to see whether it is actions
- relating to banner-blocking, images, pop-ups, content modification, cookie handling
- etc should be applied by default. It should be the first actions file loaded.
- Filter files
(the filter
- file) can be used to re-write the raw page content, including
- viewable text as well as embedded HTML and JavaScript, and whatever else
- lurks on any given web page. The filtering jobs are only pre-defined here;
- whether to apply them or not is up to the actions files.
-
character to denote a
- comment (the rest of the line will be ignored) and understand line continuation
- through placing a backslash ("wake up
requests
- must obviously be sent to the default
setting, may change, so
- please check all your configuration files on important issues.
-actions
relating to banner-blocking, images, pop-ups,
- content modification, cookie handling etc should be applied by default.
- It should be the first actions file loaded
-
- aliases
in an actions file, you have to place the (optional)
- alias section at the top of that file.
- Then comes the default set of rules which will apply universally to all
- sites and pages (be aggressive
your default settings (in the top section of the
- actions file) are, the more exceptions for trusted
sites you
- will have to make later. If, for example, you want to crunch all cookies per
- default, you'll have to make exceptions from that rule for sites that you
- regularly use and that require cookies for actually useful purposes, like maybe
- your bank, favorite shop, or newspaper.
-Cautious
, Medium
or
- Advanced
. Warning: the Advanced
setting is more
- aggressive, and will be more likely to cause problems for some sites.
- Experienced users only!
- alias
sections which will
- be discussed later. For now let's concentrate on regular sections: They have a
- heading line (often split up to multiple lines for readability) which consist
- of a list of actions, separated by whitespace and enclosed in curly braces.
- Below that, there is a list of URL and tag patterns, each on a separate line.
-action file
.
- Every time it matches, the list of applicable actions for the request is
- incrementally updated, using the heading of the section in which the
- pattern is located. The same is done again for tags and tag patterns later on.
-patterns
- to determine what patterns
use wild
- card type Regular
- Expressions
*
represents zero or more arbitrary characters (this is
- equivalent to the
- Regular
- Expression
.*
),
- ?
represents any single character (this is equivalent to the
- regular expression syntax of a simple .
), and you can define
- character classes
in square brackets which is similar to
- the same regular expression technique. All of this can be freely mixed:
-adserver.example.com
,
- ads.example.com
, etc but not sfads.example.com
- modern
POSIX 1003.2
- Regular
- Expressions
/
,
- i.e. it matches as if it would start with a ^
(regular expression speak
- for the beginning of a line).
-(?-i)
switch: .example.com
, since any documents
- within that domain are matched with or without the .*
- regular expression. This is redundant
- example.com
that is
- named index.html
, and that is part of some path. For
- example, it matches www.example.com/testing/index.html
but
- NOT www.example.com/index.html
because the regular
- expression called for at least two /'s
, thus the path
- requirement. It also would match
- www.example.com/testing/index_html
, because of the
- special meta-character .
.
- index.html
regardless of path which in this case can
- have one or more /'s
. And this one must contain exactly
- .html
(but does not have to end with that!).
- example.com
- that contains any of the words ads
, banner
,
- banners
(because of the ?
) or junk
.
- The path does not have to end in these words, just contain them.
- .jpg
, .jpeg
, .gif
or .png
. So this
- one is limited to common image formats.
- TAG:
, so &my-app;
- can tell them apart from URL patterns. Everything after the colon
- including white space, is interpreted as a regular expression with
- path pattern syntax, except that tag patterns aren't left-anchored
- automatically (&my-app; doesn't silently add a ^
,
- you have to do it yourself if you need it).
-foo
- your pattern line should be TAG:^foo$
,
- TAG:foo
would work as well, but it would also
- match requests whose tags contain foo
somewhere.
- TAG: foo
wouldn't work as it requires white space.
-+
, and turned off if preceded with a -
. So a
- do that action
, e.g.
- please block URLs that match the
- following patterns
, and don't
- block URLs that match the following patterns, even if
-
-enabled
or
- disabled
. Syntax:
- actions
are
- taken. So in this case pattern
(because of wildcards and
- regular expressions), and thus to trigger more than one set of actions! Last
- match wins.
-
prefix
- for custom headers.
- HTTP headers
are, you definitely don't need to worry about this
- one.
- BLOCKED
page
- for requests to blocked pages. This page contains the block reason given as
- parameter, a link to find out why the block action applies, and a click-through
- to the blocked content (the latter only if the force feature is available and
- enabled).
- blocking
- banner images and other content through rewriting the relevant URLs in the
- document's HTML source, so they don't get requested in the first place.
- Note that this is a totally different technique, and it's easy to confuse the two.
- X-Forwarded-For:
HTTP header from the client request,
- or adds a new one.
- block
to delete the header.add
to create the header (or append
- the client's IP address to an already existing one).
- sees
- the original.
- Troubleshooting: Anatomy of an
+ Action
has hints on how to understand and debug actions that
+ misbehave
.
+ problem
sites, and to spend more time adjusting the
+ configuration to solve these unintended consequences. In short, there is
+ not an easy way to eliminate actions
. Actions
in this context, are
+ the directives we use to tell action
. Each
+ action has a unique name and function. While there are many potential
+ Content-Type:
HTTP server header.
- Content-Type:
HTTP server header is used by the
- browser to decide what to do with the document. The value of this
- header can cause the browser to open a download menu instead of
- displaying the document by itself, even if the document's format is
- supported by the browser.
- text/html
,
- many browsers treat it as yet another broken HTML document.
- If it is send as application/xml
, browsers with
- XHTML support will only display it, if the syntax is correct.
- Content-Type: text/html
, you can use &my-app;
- to overwrite it with application/xml
and validate
- the web master's claim inside your XHTML-supporting browser.
- If the syntax is incorrect, the browser will complain loudly.
- text/html
and have it rendered as broken HTML document.
- Content-Type:
headers that look like some kind of text.
- If you want to overwrite it unconditionally, you have to combine it with
- broken image
icon). There are some limitations to this
+ though. For instance, you can't just brute-force an image substitution for
+ an entire HTML page in most situations.
+ invisible
configuration option.
+ block
blocks a site, page, or unwanted contented. Filters
+ are a way of filtering or modifying what is actually on the page. An example
+ filter usage: a text replacement of no-no
for
+ nasty-word
. That is a very simple example. This process can be
+ used for ad blocking, but it is more in the realm of advanced usage and has
+ some pitfalls to be wary off.
+actions
file, and click
+
. It is best to put personal or
+ local preferences in actions
, and URLs for ad
+ blocking or other purposes, and make other adjustments to the configuration.
+
from the
+ pop-up menu.
If-None-Match:
HTTP client header.
+ Find
:
Actions:
.
+ If not, click a
+ button, and in the new section that just appeared, click the
+ Actions:
.
+ This will bring up a list of all actions. Find
+ Enabled
column, then
+ just below the list.
+
button, and paste the URL the
+ browser got from
.
+ Remove the
(or
+
if in a pop-up window).
+ If-None-Match:
HTTP client header
- is useful for filter testing, where you want to force a real
- reload instead of getting status code 304
which
- would cause the browser to use a cached copy of the page.
- If-None-Match:
header shouldn't cause any
- caching problems, as long as the If-Modified-Since:
header
- isn't blocked or missing as well.
- patterns
, and
+ the entire actions concept, see the Actions
+ section.
+advanced
usage category, and are explained in
+ depth in later sections.
+Set-Cookie:
HTTP headers from server replies.
- Cookie:
HTTP headers from client requests.
- Use Proxy
and fill in the appropriate info
+ (Address: 127.0.0.1, Port: 8118). Include HTTPS (SSL), if you want HTTPS
+ proxy support too (sometimes labeled Secure
). Make sure any
+ checkboxes like Use the same proxy server for all protocols
is
+ last
or first
- actions
session cookies
), unless you add them to the
+ configuration. If you want the browser to handle this instead, you will need
+ to edit +filter{popups}
+downgrade-http-version
config option in
+ Actions
+ can be adjusted by pointing your browser to
+ View & Change the Current Configuration
.
+ (This is an internal page and does not require Internet access.)
+actions
that apply
+ to a given URL. In addition to the actions file
+ editor mentioned above, on
and off
(toggled) from this page.
+Contacting the
+ Developers
below.
+simple-check
to just search for the string http://
- to detect redirection URLs.
- check-decoded-url
to decode URLs (if necessary) before searching
- for redirection URLs.
- http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/
.
+ http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar
.
- contains the redirection URL http://www.example.net/
,
- followed by another parameter. http://www.example.net/&foo=bar
.
- Depending on the target server configuration, the parameter will be silently ignored
- or lead to a page not found
error. You can prevent this problem by
- first using the http://
, either in plain text
- (invalid but often used) or encoded as http%3a//
.
- Some sites use their own URL encoding scheme, encrypt the address
- of the target server or replace it with a database id. In theses cases
- config
in the current directory (except on Win32
+ where it will look for config.txt
instead). Specify
+ full path to avoid confusion. If no config file is found,
+ Rolling your own
- filters requires a knowledge of
- Regular
- Expressions
HTML
action
is not available.
- Toggle Privoxy On or Off
is handy for sites that might
+ have problems with your current actions and filters. You can in fact use
+ it as a test to see whether it is actions
+ relating to banner-blocking, images, pop-ups, content modification, cookie handling
+ etc should be applied by default. It should be the first actions file loaded.
Filter files
(the filter
+ file) can be used to re-write the raw page content, including
+ viewable text as well as embedded HTML and JavaScript, and whatever else
+ lurks on any given web page. The filtering jobs are only pre-defined here;
+ whether to apply them or not is up to the actions files.
+
character to denote a
+ comment (the rest of the line will be ignored) and understand line continuation
+ through placing a backslash ("wake up
requests
+ must obviously be sent to the default
setting, may change, so
+ please check all your configuration files on important issues.
+
+]]>
+
+actions
relating to banner-blocking, images, pop-ups,
+ content modification, cookie handling etc should be applied by default.
+ It should be the first actions file loaded
aliasesin an actions file, you have to place the (optional) + alias section at the top of that file. + Then comes the default set of rules which will apply universally to all + sites and pages (be
User-Agent: fetch libfetch/2.0and make sure -# resuming downloads continues to work. -# This way you can continue to use Tor for your normal browsing, -# without overloading the Tor network with your FreeBSD ports updates -# or downloads of bigger files like ISOs. -# Note that HTTP headers are easy to fake and therefore their -# values are as (un)trustworthy as your clients and users. -{+forward-override{forward .} \ - -hide-if-modified-since \ - -overwrite-last-modified \ -} -TAG:^User-Agent: fetch libfetch/2\.0$ -
aggressiveyour default settings (in the top section of the + actions file) are, the more exceptions for
trustedsites you + will have to make later. If, for example, you want to crunch all cookies per + default, you'll have to make exceptions from that rule for sites that you + regularly use and that require cookies for actually useful purposes, like maybe + your bank, favorite shop, or newspaper. +
Cautious,
Mediumor +
Advanced. Warning: the
Advancedsetting is more + aggressive, and will be more likely to cause problems for some sites. + Experienced users only! +
BLOCKED- page, or an empty document will be sent to the client as a substitute for the blocked content. - The
aliassections which will + be discussed later. For now let's concentrate on regular sections: They have a + heading line (often split up to multiple lines for readability) which consist + of a list of actions, separated by whitespace and enclosed in curly braces. + Below that, there is a list of URL and tag patterns, each on a separate line. +
action file. + Every time it matches, the list of applicable actions for the request is + incrementally updated, using the heading of the section in which the + pattern is located. The same is done again for tags and tag patterns later on. +
patterns+ to determine what
patternsuse wild + card type
Regular + Expressions
blocked- page, or a replacement image (as determined by the
Accept-Language:HTTP header in client requests. + Matches any URL because there's no requirement for either the + domain or the path to match anything.
block, or any user defined value. + Matches any URL with the host address
Accept-Language:to decide which one to take by default. - Sometimes it isn't possible to later switch to another language without - changing the
Accept-Language:header first. -
Accept-Language:header to languages you understand, - or to languages that aren't wide spread. -
Accept-Language:header - to a rare language, you should consider that it helps to - make your requests unique and thus easier to trace. - If you don't plan to change this header frequently, - you should stick to a common language. + Matches any URL with the host address
Content-Disposition:HTTP header set by some servers. + matches any domain that
*represents zero or more arbitrary characters (this is + equivalent to the +
Regular + Expression
.*), +
?represents any single character (this is equivalent to the + regular expression syntax of a simple
.), and you can define +
character classesin square brackets which is similar to + the same regular expression technique. All of this can be freely mixed: +
block, or any user defined value. + matches
adserver.example.com, +
ads.example.com, etc but not
sfads.example.com
Content-Disposition:HTTP header for - documents they assume you want to save locally before viewing them. - The
Content-Disposition:header contains the file name - the browser is supposed to use by default. -
Content-Disposition:header helps - to prevent this annoyance, but some browsers additionally check the -
Content-Type:header, before they decide if they can - display a document without saving it first. In these cases, you have - to change this header as well, before the browser stops displaying - download menus. -
modernPOSIX 1003.2 +
Regular + Expressions
/, + i.e. it matches as if it would start with a
^(regular expression speak + for the beginning of a line). +
(?-i)switch:
If-Modified-Since:HTTP client header or modifies its value. + Is equivalent to just
.example.com, since any documents + within that domain are matched with or without the
.*+ regular expression. This is redundant
example.comthat is + named
index.html, and that is part of some path. For + example, it matches
www.example.com/testing/index.htmlbut + NOT
www.example.com/index.htmlbecause the regular + expression called for at least two
/'s, thus the path + requirement. It also would match +
www.example.com/testing/index_html, because of the + special meta-character
.. +
block, or a user defined value that specifies a range of hours. + This regular expression is conditional so it will match any page + named
index.htmlregardless of path which in this case can + have one or more
/'s. And this one must contain exactly +
.html(and end with that!).
304, which would cause the - browser to use a cached copy of the page. -
If-Modified-Since:makes - it less likely that the server can use the time as a cookie replacement, - but you will run into caching problems if the random range is too high. -
example.com+ that contains any of the words
ads,
banner, +
banners(because of the
?) or
junk. + The path does not have to end in these words, just contain them. + The path has to contain at least two slashes (including the one at the beginning).
.jpg,
.jpeg,
.gifor
.png. So this + one is limited to common image formats.
From:HTTP header, or replaces it with the - specified string. -
TAG:, so &my-app; + can tell them apart from other patterns. Everything after the colon + including white space, is interpreted as a regular expression with + path pattern syntax, except that tag patterns aren't left-anchored + automatically (&my-app; doesn't silently add a
^, + you have to do it yourself if you need it). +
foo+ your pattern line should be
TAG:^foo$, +
TAG:foowould work as well, but it would also + match requests whose tags contain
foosomewhere. +
TAG: foowouldn't work as it requires white space. +
block, or any user defined value. -
NO-REQUEST-TAG:+ or
NO-RESPONSE-TAG:instead of
TAG:. +
NO-REQUEST-TAG:are checked + after all client headers are scanned, the ones created with
NO-RESPONSE-TAG:+ are checked after all server headers are scanned. In both cases all the created + tags are considered. +
+, and turned off if preceded with a
-. So a +
do that action, e.g. +
please block URLs that match the + following patterns, and
don't + block URLs that match the following patterns, even if++block + previously applied.
enabledor +
disabled. Syntax: +
blockwill completely remove the header - (not to be confused with the
From:headers anymore. -
Referer:(sic) HTTP header from the client request, - or replaces it with a forged one. -
actionsare + taken. So in this case
pattern(because of wildcards and + regular expressions), and thus to trigger more than one set of actions! Last + match wins. +
conditional-blockto delete the header completely if the host has changed.
conditional-forgeto forge the header if the host has changed.
blockto delete the header unconditionally.
forgeto pretend to be coming from the homepage of the server we are talking to.
click path, - but in most cases she could also get that information by comparing - other parts of the log file: for example the User-Agent if it isn't - a very common one, or the user's IP address if it doesn't change between - different requests. -
referreris the - correct English spelling, however the HTTP specification has a bug - it - requires it to be spelled as
referer.) -
User-Agent:HTTP header - in client requests with the specified value. + Sends a user defined HTTP header to the web server.
prefix + for custom headers.X-
HTTP headersare, you definitely don't need to worry about this + one.
BLOCKEDpage + for requests to blocked pages. This page contains the block reason given as + parameter, a link to find out why the block action applies, and a click-through + to the blocked content (the latter only if the force feature is available and + enabled).
https://URLs) through proxies. It works very simply: - the proxy connects to the server on the specified port, and then - short-circuits its connections to the client and to the remote server. - This means CONNECT-enabled proxies can be used as TCP relays very easily. -
blocking+ banner images and other content through rewriting the relevant URLs in the + document's HTML source, so they don't get requested in the first place. + Note that this is a totally different technique, and it's easy to confuse the two. +
X-Forwarded-For:HTTP header from the client request, + or adds a new one.
blockto delete the header.
addto create the header (or append + the client's IP address to an already existing one). +
Last-Modified:HTTP server header or modifies its value. + All client headers to which this action applies are filtered on-the-fly through + the specified regular expression based substitutions.
block,
reset-to-request-time- and
randomize+ The name of a client-header filter, as defined in one of the + filter files.
Last-Modified:header is useful for filter - testing, where you want to force a real reload instead of getting status - code
304, which would cause the browser to reuse the old - version of the page. -
randomizeoption overwrites the value of the -
Last-Modified:header with a randomly chosen time - between the original value and the current time. In theory the server - could send each document with a different
Last-Modified:- header to track visits without using cookies.
Randomize- makes it impossible and the browser can still revalidate cached documents. + Client-header filters are applied to each header on its own, not to + all at once. This makes it easier to diagnose problems, but on the downside + you can't write filters that only change header x if header y's value is z. + You can do that by using tags though.
reset-to-request-timeoverwrites the value of the -
Last-Modified:header with the current time. You could use - this option together with -
randomize. It is safe - to use, as long as the time settings are more or less correct. - If the server sets the
Last-Modified:header to the time - of the request, the random range becomes zero and the value stays the same. - Therefore you should later randomize it a second time with -
sees+ the original.
Content-Type:HTTP server header.
Content-Type:HTTP server header is used by the + browser to decide what to do with the document. The value of this + header can cause the browser to open a download menu instead of + displaying the document by itself, even if the document's format is + supported by the browser.
text/html, + many browsers treat it as yet another broken HTML document. + If it is send as
application/xml, browsers with + XHTML support will only display it, if the syntax is correct.
Content-Type: text/html, you can use &my-app; + to overwrite it with
application/xmland validate + the web master's claim inside your XHTML-supporting browser. + If the syntax is incorrect, the browser will complain loudly.
text/htmland have it rendered as broken HTML document. +
Content-Type:headers that look like some kind of text. + If you want to overwrite it unconditionally, you have to combine it with +
sees- the original. + This action allows you to block client headers for which no dedicated +
sessioncookies (for the current - browser session
expiresfield from
Set-Cookie:- server headers. Most browsers will not store such cookies permanently and - forget them in between sessions. + Deletes the
If-None-Match:HTTP client header.
If-None-Match:HTTP client header + is useful for filter testing, where you want to force a real + reload instead of getting status code
304which + would cause the browser to use a cached copy of the page.
expires- field. If you use an exotic browser, you might want to try it out to be sure. + It is also useful to make sure the header isn't used as a cookie + replacement (unlikely but possible).
If-None-Match:header shouldn't cause any + caching problems, as long as the
If-Modified-Since:header + isn't blocked or missing as well.
Set-Cookie:HTTP headers from server replies.
patternto send a built-in checkerboard pattern image. The image is visually - decent, scales very well, and makes it obvious where banners were busted. -
blankto send a built-in transparent image. This makes banners disappear - completely, but makes it hard to detect where
to - send a redirect totarget-url
file:///URL. - (But note that not all browsers support redirecting to a local file system). -
blankor
patternin - the first place, but enables your browser to cache the replacement image, instead of requesting - it over and over again. -
http://config.privoxy.org/send-banner?type=, wheretype
blankor
pattern. + This action is only concerned with
auto. It is
actions, known to
aliases, can be defined by combining other actions. - These can in turn be invoked just like the built-in actions. - Currently, an alias name can contain any character except space, tab, -
=, -
{and
}, but we
ato
z, -
0to
9,
+, and
-. - Alias names are not case sensitive, and are not required to start with a -
+or
-sign, since they are merely textually - expanded. -
shop, you can later change your policy on shops in -
shopalias is used. Calling aliases - by their purpose also makes your actions files more readable. -
/pattern): -
shopand
fragileare typically used for -
problemsites that require more than one action to be disabled - in order to function properly. -
, but this pattern - matches all URLs. Therefore, the set of - actions used in this/
defaultsection
+- preceding the action name enables the action, a
-disables!). - Also note how this long line has been made more readable by splitting it into - multiple lines with line continuation. -
Cookie:HTTP headers from client requests. +
fragile- sites, i.e. sites that require minimum interference, because they are either - very complex or very keen on tracking you (and have mechanisms in place that - make them unusable for people who avoid being tracked). We will simply use - our pre-defined
lastor
first+
firstis given, the first frame of the animation + is used as the replacement. If
lastis given, the last + frame of the animation is used instead, which probably makes more sense for + most banner animations, but also has the risk of not showing the entire + last frame (if it is only a delta to an earlier frame). +
Number of milliseconds+
blocked- by the
banners. So the above - generic patterns are surprisingly effective. -
nasty-as intended, - but alsoads .nasty-corp.com
downloor -ads .sourcefroge.net
So here come some - well-known exceptions to theads l.some-provider.net.
downloads.sourcefroge.net: Initially, all actions are deactivated, - so it wouldn't get blocked. Then comes the defaults section, which matches the - URL, but just deactivates the
cvsin them. Note that -
simple-checkto just search for the string
http://+ to detect redirection URLs. +
check-decoded-urlto decode URLs (if necessary) before searching + for redirection URLs. +
http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/. +
http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar. + contains the redirection URL
http://www.example.net/, + followed by another parameter.
http://www.example.net/&foo=bar. + Depending on the target server configuration, the parameter will be silently ignored + or lead to a
page not founderror. You can prevent this problem by + first using the
http://, either in plain text + (invalid but often used) or encoded as
http%3a//. + Some sites use their own URL encoding scheme, encrypt the address + of the target server or replace it with a database id. In theses cases +
copy image location- and pasted the URL below while removing the leading http://, into a -
broken imageicon by the - browser. Use cautiously. -
funtext replacements in
funfiltering specified here. -
Rolling your own+ filters requires a knowledge of +
Regular + Expressions
HTML
actionis not available. +
Content-Type:isn't detected as such. +
blankimage as opposed to the checkerboard pattern for -
/of course matches all URL - paths and patterns: -
Content-Type:first. +
filter file. Once defined, they - can then be invoked as an
action. -
forward .to use a direct connection without any additional proxies.
forward 127.0.0.1:8123to use the HTTP proxy listening at 127.0.0.1 port 8123. +
forward-socks4a 127.0.0.1:9050 .to use the socks4a proxy listening at + 127.0.0.1 port 9050. Replace
forward-socks4awith
forward-socks4+ to use a socks4 connection (with local DNS resolution) instead, use
forward-socks5+ for socks5 connections (with remote DNS resolution). +
forward-socks4a 127.0.0.1:9050 proxy.example.org:8000to use the socks4a proxy + listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000. + Replace
forward-socks4awith
forward-socks4to use a socks4 connection + (with local DNS resolution) instead, use
forward-socks5+ for socks5 connections (with remote DNS resolution). +
forward-webserver 127.0.0.1:80to use the HTTP + server listening at 127.0.0.1 port 80 without adjusting the + request headers. +
User-Agent: fetch libfetch/2.0and make sure +# resuming downloads continues to work. +# +# This way you can continue to use Tor for your normal browsing, +# without overloading the Tor network with your FreeBSD ports updates +# or downloads of bigger files like ISOs. +# +# Note that HTTP headers are easy to fake and therefore their +# values are as (un)trustworthy as your clients and users. +{+forward-override{forward-socks5 10.0.0.2:2222 .} \ + -hide-if-modified-since \ + -overwrite-last-modified \ +} +TAG:^User-Agent: fetch libfetch/2\.0$ +
BLOCKED+ page, or an empty document will be sent to the client as a substitute for the blocked content. + The
Content Typeheader is recognised as a sign - of text-based content, with the exception of
roll - your ownfilters, you should first be familiar with HTML syntax, - and, of course, regular expressions. -
foocould look - like this: -
Regular - Expressions
blocked+ page, or a replacement image (as determined by the
foocontent filter. We have already defined - the heading, but the jobs are still missing. Since all it does is to replace -
foowith
bar, there is only one (trivial) job - needed: -
fooshould be replaced? Our current job will only take - care of the first
fooon each page. For global substitution, - we'll need to add the
Accept-Language:HTTP header in client requests. +
block, or any user defined value. +
Accept-Language:to decide which one to take by default. + Sometimes it isn't possible to later switch to another language without + changing the
Accept-Language:header first. +
Accept-Language:header to languages you understand, + or to languages that aren't wide spread. +
Accept-Language:header + to a rare language, you should consider that it helps to + make your requests unique and thus easier to trace. + If you don't plan to change this header frequently, + you should stick to a common language. +
Match an arbitrary number of the element left of myself, this - matches
<script, followed by
document.referrer. The dot needed to - be
document.referrer, if
document.referrerappears somewhere in between. -
Content-Disposition:HTTP header set by some servers. +
eat upall - text in between
<scriptand the
document.referrer, and that the second
</script>- tag. Furthermore, the
document.referrer. Remember the parts of the script from - (and including) the start tag up to (and excluding) the string -
document.referreras
block, or any user defined value. +
document.referrer) replaced by
Content-Disposition:HTTP header for + documents they assume you want to save locally before viewing them. + The
Content-Disposition:header contains the file name + the browser is supposed to use by default. +
Content-Disposition:header helps + to prevent this annoyance, but some browsers additionally check the +
Content-Type:header, before they decide if they can + display a document without saving it first. In these cases, you have + to change this header as well, before the browser stops displaying + download menus. +
document.referrerby -
zero - or more whitespace. The
a single -. Finally,or a double quote
If-Modified-Since:HTTP client header or modifies its value. +
window.statusobject with a dummy assignment - (using a variable name that is hopefully odd enough not to conflict with - real variables in scripts). Thus, it catches many cases where e.g. pointless - descriptions are displayed in the status bar instead of the link target when - you move your mouse over links. -
block, or a user defined value that specifies a range of hours. +
onunloadattribute in -
<body>tags with the dummy word
OnUnload, but the page's - content does. -
304, which would cause the + browser to use a cached copy of the page. +
If-Modified-Since:makes + it less likely that the server can use the time as a cookie replacement, + but you will run into caching problems if the random range is too high. +
.comappears directly following
microsoft- in the page. This prevents links to microsoft.com from being trashed, while - still replacing the word everywhere else. -
From:HTTP header, or replaces it with the + specified string. +
block, or any user defined value. +
blockwill completely remove the header + (not to be confused with the
From:headers anymore. +
Referer:(sic) HTTP header from the client request, + or replaces it with a forged one. +
conditional-blockto delete the header completely if the host has changed.
exit consoles, i.e. - nasty windows that pop up when you close another one. -
conditional-forgeto forge the header if the host has changed.
blockto delete the header unconditionally.
forgeto pretend to be coming from the homepage of the server we are talking to.
click path, + but in most cases she could also get that information by comparing + other parts of the log file: for example the User-Agent if it isn't + a very common one, or the user's IP address if it doesn't change between + different requests.
referreris the + correct English spelling, however the HTTP specification has a bug - it + requires it to be spelled as
referer.)
unsolicitedpop-up - windows from opening, yet still allow pop-up windows that the user - has explicitly chosen to open. It was added in version 3.0.1, - as an improvement over earlier such filters. -
User-Agent:HTTP header + in client requests with the specified value.
webbugs. -
+enable-https-filtering+ action is used &my-app; by default verifies that the remote site uses a valid + certificate.
https://URLs) through proxies. It works very simply: + the proxy connects to the server on the specified port, and then + short-circuits its connections to the client and to the remote server. + This means CONNECT-enabled proxies can be used as TCP relays very easily. + +
cornerswould - appear to early or not at all and as fixing this would require a browser - that understands background-size (CSS3), they are removed instead. + Cookies with a lifetime below the limit are not modified. + The lifetime of session cookies is set to the specified limit. +
0, this action behaves like +
http://www.example.org.foobar.exit/- to access the host
www.example.orgthrough the -
foobar. + When compiled with zlib support (available since &my-app; 3.0.7), content that should be + filtered is decompressed on-the-fly and you don't have to worry about this action. + If you are using an older &my-app; version, or one that hasn't been compiled with zlib + support, this action can be used to convince the server to send the content uncompressed.
www.example.org.foobar.exitas host and uses it - for the
Hostand
Refererheaders. From the - server's point of view the resulting headers are invalid and can cause problems. + Most text-based instances compress very well, the size is seldom decreased by less than 50%, + for markup-heavy instances like news feeds saving more than 90% of the original size isn't + unusual.
Refererheader can trigger
hot-linking- protections, an invalid
Hostheader will make it impossible for - the server to find the right vhost (several domains hosted on the same IP address). + Not using compression will therefore slow down the transfer, and you should only + enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all + predefined action settings.
foo.exitpart in those headers - to prevent the mentioned problems. Note that it only modifies - the HTTP headers, it doesn't make it impossible for the server - to detect your
404 - No Such Domain- error page
BLOCKED- page
Last-Modified:HTTP server header or modifies its value. +
block,
reset-to-request-time+ and
randomize+
Last-Modified:header is useful for filter + testing, where you want to force a real reload instead of getting status + code
304, which would cause the browser to reuse the old + version of the page. +
randomizeoption overwrites the value of the +
Last-Modified:header with a randomly chosen time + between the original value and the current time. In theory the server + could send each document with a different
Last-Modified:+ header to track visits without using cookies.
Randomize+ makes it impossible and the browser can still revalidate cached documents. +
reset-to-request-timeoverwrites the value of the +
Last-Modified:header with the current time. You could use + this option together with +
randomize. It is safe + to use, as long as the time settings are more or less correct. + If the server sets the
Last-Modified:header to the time + of the request, the random range becomes zero and the value stays the same. + Therefore you should later randomize it a second time with +
regular - expressionsin its actions - files and filter file, - through the
regular - expressionsare, or what they can do. So this will be a very brief - introduction only. A full explanation would require a
meta-charactershave - special meanings and are used to build complex patterns to be matched against. - Perl Compatible Regular Expressions are an especially convenient -
dialectof the regular expression language. -
special- character here is the asterisk which matches any and all characters. We can be - more specific and use
dir file?.textwould match -
file1.txt,
file2.txt, etc. We are pattern - matching, using a similar technique to
regular expressions! -
special charactersand ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples: -
a, -
A,
4,
:, or
@. -
escapecharacter denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g.
.) needs to be taken literally and - not as a special meta-character. Example:
example\.com, makes - sure the period is recognized only as a period (and not expanded to its - meta-character meaning of any single character). -
[0-9]- matches any numeric digit (zero through nine). As an example, we can combine - this with
+to match any digit one of more times:
[0-9]+. -
barcharacter works like an -
orconditional statement. A match is successful if the - sub-expression on either side of
|matches. As an example: -
/(this|that) example/uses grouping and the bar character - and would match either
this exampleor
that - example, and nothing else. -
.and
*to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (
.*) another literal forward slash, the string -
banners, another forward slash, and lastly another -
.*. We are building - a directory path here. This will match any file with the path that has a - directory named
bannersin it. The
.*matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: -
/eye/hate/spammers/banners/annoy_me_please.gif, or just -
/banners/annoying.html, or almost an infinite number of other - possible combinations, just so it has
bannersin the path - somewhere. -
/), so we are - building another expression that is a file path statement. We have another -
.*, so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that
advstring is the - interesting part. -
?means the preceding expression (either a - literal character or anything grouped with
(...)in this case) - can exist or not, since this means either zero or one match. So -
((er)?ts?|ertis(ing|ements?))is optional, as are the - individual sub-expressions:
(er), -
(ing|ements?), and the
s. The
|- means
or. We have two of those. For instance, -
(ing|ements?), can expand to match either
ing-
ements?. What is being done here, is an - attempt at matching as many variations of
advertisement, and - similar, as possible. So this would expand to match just
adv, - or
advert, or
adverts, or -
advertising, or
advertisement, or -
advertisements. You get the idea. But it would not match -
advertizements(with a
z). We could fix that by - changing our regular expression to: -
/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match - either spelling. -
[ ]can be matched. This is using
0-9as a - shorthand expression to mean any digit one through nine. It is the same as - saying
0123456789. So any digit matches. The
+- means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping:
(gif|jpe?g). - This includes a
|, so this needs to match the expression on - either side of that bar character also. A simple
gifon one side, and the other - side will in turn match either
jpegor
jpg, - since the
?means the letter
eis optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string
advert, then one or more digits, and a
.- (which is now a literal, and not a special character, since it is escaped - with
\), and lastly either
gif, or -
jpeg, or
jpg. Some possible matches would - include:
//advert1.jpg, -
/nasty/ads/advert1234.gif, -
/banners/from/hell/advert99.jpg. It would not match -
advert1.gif(no leading slash), or -
/adverts232.jpg(the expression does not include an -
s), or
/advert1.jsp(
jspis not - in the expression anywhere). -
sees+ the original. +
+-+ + Typical use: +- -http://config.privoxy.org/ + Allow only temporarysessioncookies (for the current + browser sessiononly ).
+-+ Effect: +- -http://config.privoxy.org/show-status + Deletes theexpiresfield fromSet-Cookie:+ server headers. Most browsers will not store such cookies permanently and + forget them in between sessions.
+-+ + +Type: + ++ +Boolean. ++ Parameter: +- -http://config.privoxy.org/show-version + N/A
+-+ Notes: +- -http://config.privoxy.org/show-request + This is less strict thancrunch-incoming-cookies / +crunch-outgoing-cookies and allows you to browse + websites that insist or rely on setting cookies, without compromising your privacy too badly.
-- -http://config.privoxy.org/show-url-info + Most browsers will not permanently store cookies that have been processed by +session-cookies-only and will forget about them between sessions. + This makes profiling cookies useless, but won't break sites which require cookies so + that you can log in for transactions. This is generally turned on for all + sites, and is the recommended setting.
off,
Privoxy- continues to run, but only as a pass-through proxy, with no actions taking - place: -
-- -http://config.privoxy.org/toggle + It makesno sense at all to usesession-cookies-only + together withcrunch-incoming-cookies or +crunch-outgoing-cookies . If you do, cookies + will be plainly killed.
-- -http://config.privoxy.org/toggle?set=disable + Note that it is up to the browser how it handles such cookies without anexpires+ field. If you use an exotic browser, you might want to try it out to be sure.
-- -http://config.privoxy.org/toggle?set=enable + This setting also has no effect on cookies that may have been stored + previously by the browser before startingPrivoxy . + These would have to be removed manually.
bookmarkletsto allow you to easily access a -
miniversion of some of
Add to Favorites- (IE) or
Add Bookmark(Netscape). You will get a warning that - the bookmark
may not be safe- just click OK. Then you can run the - Bookmarklet directly from your favorites/bookmarks. For even faster access, - you can put them on the
Linksbar (IE) or the
Personal - Toolbar(Netscape), and run them with a single click. -
patternto send a built-in checkerboard pattern image. The image is visually + decent, scales very well, and makes it obvious where banners were busted. +
blankto send a built-in transparent image. This makes banners disappear + completely, but makes it hard to detect where
to + send a redirect totarget-url
file:///URL. + (But note that not all browsers support redirecting to a local file system). +
blankor
patternin + the first place, but enables your browser to cache the replacement image, instead of requesting + it over and over again. +
http://config.privoxy.org/send-banner?type=, wheretype
blankor
pattern.
auto. It is
+blockpatterns. If - so, the URL is then blocked, and the remote web server will not be contacted. -
+handle-as-image- and -
+handle-as-empty-document- are then checked, and if there is no match, an - HTML
BLOCKEDpage is sent back to the browser. Otherwise, if - it does match, an image is returned for the former, and an empty text - document for the latter. The type of image would depend on the setting of -
+set-image-blocker- (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). -
+fast-redirectsaction, - it is then processed. Unwanted parts of the requested URL are stripped. -
+hide-user-agent, - etc.), headers are suppressed or forged as determined by these actions and - their parameters. -
+crunch-incoming-cookies, -
+session-cookies-only, - and
+downgrade-http-version- actions. -
+filteraction - or
+deanimate-gifs- action applies (and the document type fits the action), the rest of the page is - read into memory (up to a configurable limit). Then the filter rules (from -
+filteraction - or
+deanimate-gifs- matches, then
on.) -
actions, known to
aliases, can be defined by combining other actions. + These can in turn be invoked just like the built-in actions. + Currently, an alias name can contain any character except space, tab, +
=, +
{and
}, but we
ato
z, +
0to
9,
+, and
-. + Alias names are not case sensitive, and are not required to start with a +
+or
-sign, since they are merely textually + expanded.
+filteraction) from - one of the filter files since this is handled very - differently and not so easy to trap! It also will not tell you about any other - URLs that may be embedded within the URL you are testing. For instance, images - such as ads are expressed as URLs within the raw page source of HTML pages. So - you will only get info for the actual URL that is pasted into the prompt area - -- not any sub-URLs. If you want to know about embedded URLs like ads, you - will have to dig those out of the HTML source. Use your browser's
View - Page Sourceoption for this. Or right click on the ad, and grab the - URL. + Aliases can be used throughout the actions file, but they
shop, you can later change your policy on shops in +
shopalias is used. Calling aliases + by their purpose also makes your actions files more readable. +
actions, and - which ones match for our test case,
google.com. - Displayed is all the actions that are available to us. Remember, - the
on.
off. So some are
onhere, but many - are
off. Each example we try may provide a slightly different - end result, depending on our configuration directives. -
actionsfile, this would be the - section just below the
aliasessection near the top. This - will apply to all URLs as signified by the single forward slash at the end - of the listing --
/. + ...and put them to use. These sections would appear in the lower part of an + actions file and define exceptions to the default actions (as specified further + up for the
/pattern):
.google.com. The first is negating our previous - cookie setting, which was for
+session-cookies-only- (i.e. not persistent). So we will allow persistent cookies for google, at - least that is how it is in this example. The second turns -
+fast-redirects- action, allowing this to take place unmolested. Note that there is a leading - dot here --
.google.com. This will match any hosts and - sub-domains, in the google.com domain also, such as -
www.google.comor
mail.google.com. But it would not - match
www.google.de! So, apparently, we have these two actions - defined as exceptions to the general rules at the top somewhere in the lower - part of our
google.comis referenced somewhere in these latter sections. -
shopand
fragileare typically used for +
problemsites that require more than one action to be disabled + in order to function properly.
actions- to
google.com: - + The above chapters have shown which actions files + there are and how they are organized, how actions are specified and applied + to URLs, how patterns work, and how to + define and use aliases. Now, let's look at an + example
fast-redirectsand
session-cookies-only, - which are activated specifically for this site in our configuration, - and thus show in the
Final Results. + While the
, but this pattern + matches all URLs. Therefore, the set of + actions used in this/
defaultsection
ad.doubleclick.net: + Again, at the start of matching, all actions are disabled, so there is + no need to disable any actions here. (Remember: a
++ preceding the action name enables the action, a
-disables!). + Also note how this long line has been made more readable by splitting it into + multiple lines with line continuation.
+block{}sections, - and a
+block{} +handle-as-image, - which is the expanded form of one of our aliases that had been defined as: -
+block-as-image. (
Aliasesare defined in - the first section of the actions file and typically used to combine more - than one action.) + The default behavior is now set.
ad.doubleclick.net- is done here -- as both a
+block{}-
+handle-as-image. - The custom alias
just - simplifies the process and make it more readable. + If you aren't a developer, there's no need for you to edit the ++block-as-image
http://www.example.net/adsl/HOWTO/. - This one is giving us problems. We are getting a blank page. Hmmm ... + Understanding the
/adsl/is matching
/adsin our - configuration! But we did not want this at all! Now we see why we get the - blank page. It is actually triggering two different actions here, and - the effects are aggregated so that the URL is blocked, and &my-app; is told - to treat the block as if it were an image. But this is, of course, all wrong. - We could now add a new action below this (or better in our own -
{-block}) paths with -
adslin them (remember, last match in the configuration - wins). There are various ways to handle such exceptions. Example: + After that comes the (optional) alias section. We'll use the example + section from the above chapter on aliases, + that also explains why and how aliases are used:
fragile+ sites, i.e. sites that require minimum interference, because they are either + very complex or very keen on tracking you (and have mechanisms in place that + make them unusable for people who avoid being tracked). We will use + our pre-defined
+filteractions. - These tend to be harder to troubleshoot. - Try adding the URL for the site to one of aliases that turn off -
+filter: + Shopping sites are not as fragile, but they typically + require cookies to log in, and pop-up windows for shopping + carts or item details. Again, we'll use a pre-defined alias:
is an{ shop }
aliasthat expands to -
. - Or you could do your own exception to negate filtering: +{ -filter -session-cookies-only }
+filter{banners-by-size}- rule, which assumes - that images of certain sizes are ad banners (works well -
is an alias that disables most - actions that are the most likely to cause trouble. This can be used as a - last resort for problem sites. + One of the most important jobs of{ fragile }
blocked+ by the
.com). This will effectively match any TLD with -
banners. So the above + generic patterns are surprisingly effective.
nasty-as intended, + but alsoads .nasty-corp.com
downloor +ads .sourcefroge.net
So here come some + well-known exceptions to theads l.some-provider.net.
downloads.sourcefroge.net: Initially, all actions are deactivated, + so it wouldn't get blocked. Then comes the defaults section, which matches the + URL, but just deactivates the
copy image location+ and pasted the URL below while removing the leading http://, into a +
broken imageicon by the + browser. Use cautiously. +
funtext replacements in
funfiltering specified here. +
blankimage as opposed to the checkerboard pattern for +
/of course matches all URL + paths and patterns: +
filter file. Once defined, they + can then be invoked as an
action. +
Content Typeheader is recognised as a sign + of text-based content, with the exception of
roll + your ownfilters, you should first be familiar with HTML syntax, + and, of course, regular expressions. +
foocould look + like this: +
Regular + Expressions
foocontent filter. We have already defined + the heading, but the jobs are still missing. Since all it does is to replace +
foowith
bar, there is only one (trivial) job + needed: +
fooshould be replaced? Our current job will only take + care of the first
fooon each page. For global substitution, + we'll need to add the
Match an arbitrary number of the element left of myself, this + matches
<script, followed by
document.referrer. The dot needed to + be
document.referrer, if
document.referrerappears somewhere in between. +
eat upall + text in between
<scriptand the
document.referrer, and that the second
</script>+ tag. Furthermore, the
document.referrer. Remember the parts of the script from + (and including) the start tag up to (and excluding) the string +
document.referreras
document.referrer) replaced by
document.referrerby +
zero + or more whitespace. The
a single +. Finally,or a double quote
window.statusobject with a dummy assignment + (using a variable name that is hopefully odd enough not to conflict with + real variables in scripts). Thus, it catches many cases where e.g. pointless + descriptions are displayed in the status bar instead of the link target when + you move your mouse over links. +
onunloadattribute in +
<body>tags with the dummy word
OnUnload, but the page's + content does. +
.comappears directly following
microsoft+ in the page. This prevents links to microsoft.com from being trashed, while + still replacing the word everywhere else. +
exit consoles, i.e. + nasty windows that pop up when you close another one. +
unsolicitedpop-up + windows from opening, yet still allow pop-up windows that the user + has explicitly chosen to open. It was added in version 3.0.1, + as an improvement over earlier such filters. +
webbugs. +
cornerswould + appear to early or not at all and as fixing this would require a browser + that understands background-size (CSS3), they are removed instead. +
http://www.example.org.foobar.exit/+ to access the host
www.example.orgthrough the +
foobar. +
www.example.org.foobar.exitas host and uses it + for the
Hostand
Refererheaders. From the + server's point of view the resulting headers are invalid and can cause problems. +
Refererheader can trigger
hot-linking+ protections, an invalid
Hostheader will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). +
foo.exitpart in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your
404 - No Such Domain+ error page
BLOCKED+ page
regular + expressionsin its actions + files and filter file, + through the
regular + expressionsare, or what they can do. So this will be a very brief + introduction only. A full explanation would require a
meta-charactershave + special meanings and are used to build complex patterns to be matched against. + Perl Compatible Regular Expressions are an especially convenient +
dialectof the regular expression language. +
special+ character here is the asterisk which matches any and all characters. We can be + more specific and use
dir file?.textwould match +
file1.txt,
file2.txt, etc. We are pattern + matching, using a similar technique to
regular expressions! +
special charactersand ways of + building complex patterns however. Let's look at a few of the common ones, + and then some examples: +
a, +
A,
4,
:, or
@. +
escapecharacter denotes that + the following character should be taken literally. This is used where one of the + special characters (e.g.
.) needs to be taken literally and + not as a special meta-character. Example:
example\.com, makes + sure the period is recognized only as a period (and not expanded to its + meta-character meaning of any single character). +
[0-9]+ matches any numeric digit (zero through nine). As an example, we can combine + this with
+to match any digit one of more times:
[0-9]+. +
barcharacter works like an +
orconditional statement. A match is successful if the + sub-expression on either side of
|matches. As an example: +
/(this|that) example/uses grouping and the bar character + and would match either
this exampleor
that + example, and nothing else. +
.and
*to + denote any character, zero or more times. In other words, any string at all. + So we start with a literal forward slash, then our regular expression pattern + (
.*) another literal forward slash, the string +
banners, another forward slash, and lastly another +
.*. We are building + a directory path here. This will match any file with the path that has a + directory named
bannersin it. The
.*matches + any characters, and this could conceivably be more forward slashes, so it + might expand into a much longer looking path. For example, this could match: +
/eye/hate/spammers/banners/annoy_me_please.gif, or just +
/banners/annoying.html, or almost an infinite number of other + possible combinations, just so it has
bannersin the path + somewhere. +
/), so we are + building another expression that is a file path statement. We have another +
.*, so we are matching against any conceivable sub-path, just so + it matches our expression. The only true literal that
advstring is the + interesting part. +
?means the preceding expression (either a + literal character or anything grouped with
(...)in this case) + can exist or not, since this means either zero or one match. So +
((er)?ts?|ertis(ing|ements?))is optional, as are the + individual sub-expressions:
(er), +
(ing|ements?), and the
s. The
|+ means
or. We have two of those. For instance, +
(ing|ements?), can expand to match either
ing+
ements?. What is being done here, is an + attempt at matching as many variations of
advertisement, and + similar, as possible. So this would expand to match just
adv, + or
advert, or
adverts, or +
advertising, or
advertisement, or +
advertisements. You get the idea. But it would not match +
advertizements(with a
z). We could fix that by + changing our regular expression to: +
/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match + either spelling. +
[ ]can be matched. This is using
0-9as a + shorthand expression to mean any digit one through nine. It is the same as + saying
0123456789. So any digit matches. The
++ means one or more of the preceding expression must be included. The preceding + expression here is what is in the square brackets -- in this case, any digit + one through nine. Then, at the end, we have a grouping:
(gif|jpe?g). + This includes a
|, so this needs to match the expression on + either side of that bar character also. A simple
gifon one side, and the other + side will in turn match either
jpegor
jpg, + since the
?means the letter
eis optional and + can be matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string
advert, then one or more digits, and a
.+ (which is now a literal, and not a special character, since it is escaped + with
\), and lastly either
gif, or +
jpeg, or
jpg. Some possible matches would + include:
//advert1.jpg, +
/nasty/ads/advert1234.gif, +
/banners/from/hell/advert99.jpg. It would not match +
advert1.gif(no leading slash), or +
/adverts232.jpg(the expression does not include an +
s), or
/advert1.jsp(
jspis not + in the expression anywhere). +
+++ +http://config.privoxy.org/ +
+++ +http://config.privoxy.org/client-tags +
+++ +http://config.privoxy.org/show-status +
+++ +http://config.privoxy.org/show-request +
+++ +http://config.privoxy.org/show-url-info +
off,
Privoxy+ continues to run, but only as a pass-through proxy, with no actions taking + place: +
+++ +http://config.privoxy.org/toggle +
+++ +http://config.privoxy.org/toggle?set=disable +
+++ +http://config.privoxy.org/toggle?set=enable +
+blockpatterns. If + so, the URL is then blocked, and the remote web server will not be contacted. +
+handle-as-image+ and +
+handle-as-empty-document+ are then checked, and if there is no match, an + HTML
BLOCKEDpage is sent back to the browser. Otherwise, if + it does match, an image is returned for the former, and an empty text + document for the latter. The type of image would depend on the setting of +
+set-image-blocker+ (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). +
+fast-redirectsaction, + it is then processed. Unwanted parts of the requested URL are stripped. +
+hide-user-agent, + etc.), headers are suppressed or forged as determined by these actions and + their parameters. +
+crunch-incoming-cookies, +
+session-cookies-only, + and
+downgrade-http-version+ actions. +
+filteraction + or
+deanimate-gifs+ action applies (and the document type fits the action), the rest of the page is + read into memory (up to a configurable limit). Then the filter rules (from +
+filteraction + or
+deanimate-gifs+ matches, then
on.) +
+filteraction) from + one of the filter files since this is handled very + differently and not so easy to trap! It also will not tell you about any other + URLs that may be embedded within the URL you are testing. For instance, images + such as ads are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the prompt area + -- not any sub-URLs. If you want to know about embedded URLs like ads, you + will have to dig those out of the HTML source. Use your browser's
View + Page Sourceoption for this. Or right click on the ad, and grab the + URL. +
actions, and + which ones match for our test case,
google.com. + Displayed is all the actions that are available to us. Remember, + the
on.
off. So some are
onhere, but many + are
off. Each example we try may provide a slightly different + end result, depending on our configuration directives. +
actionsfile, this would be the + section just below the
aliasessection near the top. This + will apply to all URLs as signified by the single forward slash at the end + of the listing --
/. +
.google.com. The first is negating our previous + cookie setting, which was for
+session-cookies-only+ (i.e. not persistent). So we will allow persistent cookies for google, at + least that is how it is in this example. The second turns +
+fast-redirects+ action, allowing this to take place unmolested. Note that there is a leading + dot here --
.google.com. This will match any hosts and + sub-domains, in the google.com domain also, such as +
www.google.comor
mail.google.com. But it would not + match
www.google.de! So, apparently, we have these two actions + defined as exceptions to the general rules at the top somewhere in the lower + part of our
google.comis referenced somewhere in these latter sections. +
actions+ to
google.com: +
fast-redirectsand
session-cookies-only, + which are activated specifically for this site in our configuration, + and thus show in the
Final Results. +
ad.doubleclick.net: +
+block{}sections, + and a
+block{} +handle-as-image, + which is the expanded form of one of our aliases that had been defined as: +
+block-as-image. (
Aliasesare defined in + the first section of the actions file and typically used to combine more + than one action.) +
ad.doubleclick.net+ is done here -- as both a
+block{}+
+handle-as-image. + The custom alias
just + simplifies the process and make it more readable. ++block-as-image
http://www.example.net/adsl/HOWTO/. + This one is giving us problems. We are getting a blank page. Hmmm ... +
/adsl/is matching
/adsin our + configuration! But we did not want this at all! Now we see why we get the + blank page. It is actually triggering two different actions here, and + the effects are aggregated so that the URL is blocked, and &my-app; is told + to treat the block as if it were an image. But this is, of course, all wrong. + We could now add a new action below this (or better in our own +
{-block}) paths with +
adslin them (remember, last match in the configuration + wins). There are various ways to handle such exceptions. Example: +
+filteractions. + These tend to be harder to troubleshoot. + Try adding the URL for the site to one of aliases that turn off +
+filter: +
is an{ shop }
aliasthat expands to +
. + Or you could do your own exception to negate filtering: +{ -filter -session-cookies-only }
+filter{banners-by-size}+ rule, which assumes + that images of certain sizes are ad banners (works well +
is an alias that disables most + actions that are the most likely to cause trouble. This can be used as a + last resort for problem sites. +{ fragile }
.com). This will effectively match any TLD with +