@@ -165,7 +151,7 @@ FILTER: foo Replace all "foo" with "bar"
started.
-
+
@@ -185,13 +171,13 @@ s/foo/bar/
|
But wait! Didn't the comment say that all occurrences of "foo" should be replaced? Our current job will only take
- care of the first "foo" on each page. For
- global substitution, we'll need to add the g
- option:
+ "emphasis">all occurrences of
+ "foo" should be replaced? Our current job
+ will only take care of the first "foo" on
+ each page. For global substitution, we'll need to add the g option:
-
+
@@ -203,7 +189,7 @@ s/foo/bar/g
Our complete filter now looks like this:
-
+
@@ -219,7 +205,7 @@ s/foo/bar/g
arise from JavaScript abuse. Let's look at its jobs one after the
other:
-
+
@@ -244,20 +230,20 @@ s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|U
matches any character, and * means:
"Match an arbitrary number of the element left of
myself", this matches "<script",
- followed by any text, i.e. it
- matches the whole page, from the start of the first <script>
- tag.
+ followed by any
+ text, i.e. it matches the whole page, from the start of the first
+ <script> tag.
That's more than we want, but the pattern continues: document\.referrer matches only the exact string
"document.referrer". The dot needed to be
- escaped, i.e. preceded by a
- backslash, to take away its special meaning as a joker, and make it
- just a regular dot. So far, the meaning is: Match from the start of the
- first <script> tag in a the page, up to, and including, the text
- "document.referrer", if both are present in the page (and appear
- in that order).
+ escaped, i.e.
+ preceded by a backslash, to take away its special meaning as a joker,
+ and make it just a regular dot. So far, the meaning is: Match from the
+ start of the first <script> tag in a the page, up to, and
+ including, the text "document.referrer", if
+ both are present
+ in the page (and appear in that order).
But there's still more pattern to go. The next element, again
enclosed in parentheses, is .*</script>.
@@ -275,14 +261,14 @@ s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|U
matching, which means that the first .* in the
pattern will only "eat up" all text in
between "<script" and the first occurrence of "document.referrer", and that the second .* will only span the text up to the first "</script>" tag. Furthermore, the s option says that the match may span multiple lines in
- the page, and the g option again means that
- the substitution is global.
+ "emphasis">first occurrence of
+ "document.referrer", and that the second
+ .* will only span the text up to the
+ first
+ "</script>" tag. Furthermore, the
+ s option says that the match may span multiple
+ lines in the page, and the g option again
+ means that the substitution is global.
So, to summarize, the pattern means: Match all scripts that contain
the text "document.referrer". Remember the
@@ -295,10 +281,10 @@ s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|U
things? So lets look at the substitute: $1"Not Your
Business!"$2 is easy to read: The text remembered as $1, followed by "Not Your
- Business!" (including
- the quotation marks!), followed by the text remembered as $2. This produces an exact copy of the original string,
- with the middle part (the (including the quotation marks!), followed by the
+ text remembered as $2. This produces an exact
+ copy of the original string, with the middle part (the "document.referrer") replaced by "Not Your Business!".
@@ -312,7 +298,7 @@ s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|U
We'll show you two other jobs from the JavaScript taming department,
but this time only point out the constructs of special interest:
-
+
@@ -331,13 +317,14 @@ s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
"LITERAL">.*? makes this matching of arbitrary text ungreedy.
(Note that the U option is not set). The
['"] construct means: "a
- single or a double
- quote". Finally, \1 is a back-reference
- to the first parenthesis just like $1 above,
- with the difference that in the pattern, a backslash indicates a
- back-reference, whereas in the substitute, it's the dollar.
+ single or a
+ double quote". Finally, \1 is a
+ back-reference to the first parenthesis just like $1 above, with the difference that in the pattern, a backslash
+ indicates a back-reference, whereas in the substitute, it's the
+ dollar.
So what does this job do? It replaces assignments of single- or
double-quoted strings to the "window.status"
@@ -347,7 +334,7 @@ s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
the status bar instead of the link target when you move your mouse over
links.
-
+
@@ -362,21 +349,22 @@ s/(<body [^>]*)onunload(.*>)/$1never$2/iU
Including the OnUnload event binding in the HTML DOM was a
- CRIME. When I close a browser
- window, I want it to close and die. Basta. This job replaces the
- "onunload" attribute in "<body>" tags with the dummy word never. Note that the i option
- makes the pattern matching case-insensitive. Also note that ungreedy
- matching alone doesn't always guarantee a minimal match: In the first
- parenthesis, we had to use [^>]* instead of
- .* to prevent the match from exceeding the
- <body> tag if it doesn't contain "OnUnload", but the page's content does.
+ CRIME. When I
+ close a browser window, I want it to close and die. Basta. This job
+ replaces the "onunload" attribute in
+ "<body>" tags with the dummy word
+ never. Note that the i option makes the pattern matching case-insensitive.
+ Also note that ungreedy matching alone doesn't always guarantee a
+ minimal match: In the first parenthesis, we had to use [^>]* instead of .* to
+ prevent the match from exceeding the <body> tag if it doesn't
+ contain "OnUnload", but the page's content
+ does.
The last example is from the fun department:
-
+
@@ -397,7 +385,7 @@ s/microsoft(?!\.com)/MicroSuck/ig
to microsoft.com from being trashed, while still replacing the word
everywhere else.
-
+
@@ -436,7 +424,8 @@ s* industry[ -]leading \
- - js-annoyances
+ - js-annoyances
-
The purpose of this filter is to get rid of particularly
@@ -471,7 +460,8 @@ s* industry[ -]leading \
sites that rely heavily on JavaScript.
- - js-events
+ - js-events
-
This is a very radical measure. It removes virtually all
@@ -479,13 +469,14 @@ s* industry[ -]leading \
to user actions such as mouse movements or clicks, window
resizing etc, anymore. Use with caution!
- We strongly
- discourage using this filter as a default since it breaks
- many legitimate scripts. It is meant for use only on extra-nasty
- sites (should you really need to go there).
+ We strongly
+ discourage using this filter as a default since it
+ breaks many legitimate scripts. It is meant for use only on
+ extra-nasty sites (should you really need to go there).
- - html-annoyances
+ - html-annoyances
-
This filter will undo many common instances of HTML based
@@ -498,7 +489,8 @@ s* industry[ -]leading \
if specified otherwise.
- - content-cookies
+ - content-cookies
-
Most cookies are set in the HTTP dialog, where they can be
@@ -516,7 +508,8 @@ s* industry[ -]leading \
wherever you would also use the cookie crunch actions.
- - refresh tags
+ - refresh-tags
-
Disable any refresh tags if the interval is greater than nine
@@ -525,8 +518,8 @@ s* industry[ -]leading \
those who find this HTML feature annoying.
- - unsolicited-popups
+ - unsolicited-popups
-
This filter attempts to prevent only
- - all-popups
+ - all-popups
-
-
Attempt to prevent all pop-up windows from opening.
- Note this should be used with even more discretion than the
- above, since it is more likely to break some sites that require
- pop-ups for normal usage. Use with caution.
+ Attempt to prevent all pop-up windows from opening. Note this
+ should be used with even more discretion than the above, since it
+ is more likely to break some sites that require pop-ups for
+ normal usage. Use with caution.
- - img-reorder
+ - img-reorder
-
This is a helper filter that has no value if used alone. It
@@ -566,7 +561,8 @@ s* industry[ -]leading \
and should be enabled together with them.
- - banners-by-size
+ - banners-by-size
-
This filter removes image tags purely based on what size they
@@ -580,10 +576,12 @@ s* industry[ -]leading \
Recommended only for those who require extreme ad blocking.
The default block rules should catch 95+% of all ads without this filter enabled.
+ "emphasis">without this filter
+ enabled.
- - banners-by-link
+ - banners-by-link
-
This is an experimental filter that attempts to kill any
@@ -592,7 +590,8 @@ s* industry[ -]leading \
recommended for use by default.
- - webbugs
+ - webbugs
-
Webbugs are small, invisible images (technically 1X1 GIF
@@ -609,7 +608,8 @@ s* industry[ -]leading \
"QUOTE">"webbugs".
- - tiny-textforms
+ - tiny-textforms
-
A rather special-purpose filter that can be used to enlarge
@@ -621,7 +621,8 @@ s* industry[ -]leading \
It is not recommended to use this filter as a default.
- - jumping-windows
+ - jumping-windows
-
Many consider windows that move, or resize themselves to be
@@ -630,7 +631,8 @@ s* industry[ -]leading \
using this filter. Use with caution.
- - frameset-borders
+ - frameset-borders
-
Some web designers seem to assume that everyone in the world
@@ -644,7 +646,8 @@ s* industry[ -]leading \
applied to sites which need it.
- - demoronizer
+ - demoronizer
-
Many Microsoft products that generate HTML use non-standard
@@ -661,7 +664,8 @@ s* industry[ -]leading \
fly.
- - shockwave-flash
+ - shockwave-flash
-
A filter for shockwave haters. As the name suggests, this
@@ -669,22 +673,23 @@ s* industry[ -]leading \
shockwave flash objects.
- - quicktime-kioskmode
+ - quicktime-kioskmode
-
Change HTML code that embeds Quicktime objects so that
kioskmode, which prevents saving, is disabled.
- - fun
+ - fun
-
Text replacements for subversive browsing fun. Make fun of
your favorite Monopolist or play buzzword bingo.
- - crude-parental
+ - crude-parental
-
A demonstration-only filter that shows how
- - ie-exploits
+ - ie-exploits
-
An experimental collection of text replacements to disable
@@ -704,7 +710,8 @@ s* industry[ -]leading \
substantial protection.
- - site-specifics
+ - site-specifics
-
Some web sites have very specific problems, the cure for which
@@ -718,28 +725,31 @@ s* industry[ -]leading \
filter.
- - google
+ - google
-
A CSS based block for Google text ads. Also removes a width
limitation and the toolbar advertisement.
- - yahoo
+ - yahoo
-
Another CSS based block, this time for Yahoo text ads. And
removes a width limitation as well.
- - msn
+ - msn
-
Another CSS based block, this time for MSN text ads. And
removes tracking URLs, as well as a width limitation.
- - blogspot
+ - blogspot
-
Cleans up some Blogspot blogs. Read the fine print before
@@ -752,29 +762,32 @@ s* industry[ -]leading \
understands background-size (CSS3), they are removed instead.
- - xml-to-html
+ - xml-to-html
-
Server-header filter to change the Content-Type from xml to
html.
- - html-to-xml
+ - html-to-xml
-
Server-header filter to change the Content-Type from html to
xml.
- - no-ping
+ - no-ping
-
Removes the non-standard ping
attribute from anchor and area HTML tags.
- - hide-tor-exit-notation
+ - hide-tor-exit-notation
-
Client-header filter to remove the Tor
@@ -815,7 +828,7 @@ s* industry[ -]leading \
| | | | | | | |