-Junkbuster can use "regular expressions" in various config files. Assuming
-support for "pcre" (Perl Compatible Regular Expressions) is compiled in, which
-is the default. Such configuration directives do not require regular
-expressions, but they can be used to increase flexibility by matching a pattern
-with wild-cards against URLs.
-
-If you are reading this, you probably don't understand what "regular
-expressions" are, or what they can do. So this will be a very brief
-introduction only. A full explanation would require a book ;-)
-
-"Regular expressions" is a way of matching one character expression against
-another to see if it matches or not. One of the "expressions" is a literal
-string of readable characters (letter, numbers, etc), and the other is a
-complex string of literal characters combined with wild-cards, and other
-special characters, called meta-characters. The "meta-characters" have special
-meanings and are used to build the complex pattern to be matched against. Perl
-Compatible Regular Expressions is an enhanced form of the regular expression
-language with backward compatibility.
-
-To make a simple analogy, we do something similar when we use wild-card
-characters when listing files with the dir command in DOS. *.* matches all
-filenames. The "special" character here is the asterisk which matches any and
-all characters. We can be more specific and use ? to match just individual
-characters. So "dir file?.text" would match "file1.txt", "file2.txt", etc. We
-are pattern matching, using a similar technique to "regular expressions"!
-
-Regular expressions do essentially the same thing, but are much, much more
-powerful. There are many more "special characters" and ways of building complex
-patterns however. Let's look at a few of the common ones, and then some
-examples:
-
-. - Matches any single character, e.g. "a", "A", "4", ":", or "@".
-
-? - The preceding character or expression is matched ZERO or ONE times. Either/
-or.
-
-+ - The preceding character or expression is matched ONE or MORE times.
-
-* - The preceding character or expression is matched ZERO or MORE times.
-
-\ - The "escape" character denotes that the following character should be taken
-literally. This is used where one of the special characters (e.g. ".") needs to
-be taken literally and not as a special meta-character.
-
-[] - Characters enclosed in brackets will be matched if any of the enclosed
-characters are encountered.
-
-() - parentheses are used to group a sub-expression, or multiple
-sub-expressions.
-
-| - The "bar" character works like an "or" conditional statement. A match is
-successful if the sub-expression on either side of "|" matches.
-
-s/string1/string2/g - This is used to rewrite strings of text. "string1" is
-replaced by "string2" in this example.
-
-These are just some of the ones you are likely to use when matching URLs with
-Junkbuster, and is a long way from a definitive list. This is enough to get us
-started with a few simple examples which may be more illuminating:
-
-/.*/banners/.* - A simple example that uses the common combination of "." and "
-*" to denote any character, zero or more times. In other words, any string at
-all. So we start with a literal forward slash, then our regular expression
-pattern (".*") another literal forward slash, the string "banners", another
-forward slash, and lastly another ".*". We are building a directory path here.
-This will match any file with the path that has a directory named "banners" in
-it. The ".*" matches any characters, and this could conceivably be more forward
-slashes, so it might expand into a much longer looking path. For example, this
-could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or just "/
-banners/annoying.html", or almost an infinite number of other possible
-combinations, just so it has "banners" in the path somewhere.
-
-A now something a little more complex:
-
-/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal forward
-slashes again ("/"), so we are building another expression that is a file path
-statement. We have another ".*", so we are matching against any conceivable
-sub-path, just so it matches our expression. The only true literal that must
-match our pattern is adv, together with the forward slashes. What comes after
-the "adv" string is the interesting part.
-
-Remember the "?" means the preceding expression (either a literal character or
-anything grouped with "(...)" in this case) can exist or not, since this means
-either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as
-are the individual sub-expressions: "(er)", "(ing|ements?)", and the "s". The "
-|" means "or". We have two of those. For instance, "(ing|ements?)", can expand
-to match either "ing" OR "ements?". What is being done here, is an attempt at
-matching as many variations of "advertisement", and similar, as possible. So
-this would expand to match just "adv", or "advert", or "adverts", or
-"advertising", or "advertisement", or "advertisements". You get the idea. But
-it would not match "advertizements" (with a "z"). We could fix that by changing
-our regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which
-would then match either spelling.
-
-/.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward
-slashes. Anything in the square brackets "[]" can be matched. This is using
-"0-9" as a shorthand expression to mean any digit one through nine. It is the
-same as saying "0123456789". So any digit matches. The "+" means one or more of
-the preceding expression must be included. The preceding expression here is
-what is in the square brackets -- in this case, any digit one through nine.
-Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so
-this needs to match the expression on either side of that bar character also. A
-simple "gif" on one side, and the other side will in turn match either "jpeg"
-or "jpg", since the "?" means the letter "e" is optional and can be matched
-once or not at all. So we are building an expression here to match image GIF or
-JPEG type image file. It must include the literal string "advert", then one or
-more digits, and a "." (which is now a literal, and not a special character,
-since it is escaped with "\"), and lastly either "gif", or "jpeg", or "jpg".
-Some possible matches would include: "//advert1.jpg", "/nasty/ads/
-advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match
-"advert1.gif" (no leading slash), or "/adverts232.jpg" (the expression does not
-include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere).
-
-s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck" will
-replace any occurrence of "microsoft". The "i" at the end of the expression
-means ignore case. The "(?!.com)" means the match should fail if "microsoft" is
-followed by ".com". In other words, this acts like a "NOT" modifier. In case
-this is a hyperlink, we don't want to break it ;-).
-
-We are barely scratching the surface of regular expressions here so that you
-can understand the default Junkbuster configuration files, and maybe use this
-knowledge to customize your own installation. There is much, much more that can
-be done with regular expressions. Now that you know enough to get started, you
-can learn more on your own :/
-
-More reading on Perl Compatible Regular expressions: http://www.perldoc.com/
-perl5.6/pod/perlre.html
-
--------------------------------------------------------------------------------
-
+ Junkbuster can use "regular expressions" in various config files.
+ Assuming support for "pcre" (Perl Compatible Regular Expressions) is
+ compiled in, which is the default. Such configuration directives do
+ not require regular expressions, but they can be used to increase
+ flexibility by matching a pattern with wild-cards against URLs.
+
+ If you are reading this, you probably don't understand what "regular
+ expressions" are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+
+ "Regular expressions" is a way of matching one character expression
+ against another to see if it matches or not. One of the "expressions"
+ is a literal string of readable characters (letter, numbers, etc), and
+ the other is a complex string of literal characters combined with
+ wild-cards, and other special characters, called meta-characters. The
+ "meta-characters" have special meanings and are used to build the
+ complex pattern to be matched against. Perl Compatible Regular
+ Expressions is an enhanced form of the regular expression language
+ with backward compatibility.
+
+ To make a simple analogy, we do something similar when we use
+ wild-card characters when listing files with the dir command in DOS.
+ *.* matches all filenames. The "special" character here is the
+ asterisk which matches any and all characters. We can be more specific
+ and use ? to match just individual characters. So "dir file?.text"
+ would match "file1.txt", "file2.txt", etc. We are pattern matching,
+ using a similar technique to "regular expressions"!
+
+ Regular expressions do essentially the same thing, but are much, much
+ more powerful. There are many more "special characters" and ways of
+ building complex patterns however. Let's look at a few of the common
+ ones, and then some examples:
+
+ . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
+
+ ? - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+
+ + - The preceding character or expression is matched ONE or MORE
+ times.
+
+ * - The preceding character or expression is matched ZERO or MORE
+ times.
+
+ \ - The "escape" character denotes that the following character should
+ be taken literally. This is used where one of the special characters
+ (e.g. ".") needs to be taken literally and not as a special
+ meta-character.
+
+ [] - Characters enclosed in brackets will be matched if any of the
+ enclosed characters are encountered.
+
+ () - parentheses are used to group a sub-expression, or multiple
+ sub-expressions.
+
+ | - The "bar" character works like an "or" conditional statement. A
+ match is successful if the sub-expression on either side of "|"
+ matches.
+
+ s/string1/string2/g - This is used to rewrite strings of text.
+ "string1" is replaced by "string2" in this example.
+
+ These are just some of the ones you are likely to use when matching
+ URLs with Junkbuster, and is a long way from a definitive list. This
+ is enough to get us started with a few simple examples which may be
+ more illuminating:
+
+ /.*/banners/.* - A simple example that uses the common combination of
+ "." and "*" to denote any character, zero or more times. In other
+ words, any string at all. So we start with a literal forward slash,
+ then our regular expression pattern (".*") another literal forward
+ slash, the string "banners", another forward slash, and lastly another
+ ".*". We are building a directory path here. This will match any file
+ with the path that has a directory named "banners" in it. The ".*"
+ matches any characters, and this could conceivably be more forward
+ slashes, so it might expand into a much longer looking path. For
+ example, this could match:
+ "/eye/hate/spammers/banners/annoy_me_please.gif", or just
+ "/banners/annoying.html", or almost an infinite number of other
+ possible combinations, just so it has "banners" in the path somewhere.
+
+ A now something a little more complex:
+
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
+ forward slashes again ("/"), so we are building another expression
+ that is a file path statement. We have another ".*", so we are
+ matching against any conceivable sub-path, just so it matches our
+ expression. The only true literal that must match our pattern is adv,
+ together with the forward slashes. What comes after the "adv" string
+ is the interesting part.
+
+ Remember the "?" means the preceding expression (either a literal
+ character or anything grouped with "(...)" in this case) can exist or
+ not, since this means either zero or one match. So
+ "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
+ sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
+ "or". We have two of those. For instance, "(ing|ements?)", can expand
+ to match either "ing" OR "ements?". What is being done here, is an
+ attempt at matching as many variations of "advertisement", and
+ similar, as possible. So this would expand to match just "adv", or
+ "advert", or "adverts", or "advertising", or "advertisement", or
+ "advertisements". You get the idea. But it would not match
+ "advertizements" (with a "z"). We could fix that by changing our
+ regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
+ which would then match either spelling.
+
+ /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
+ forward slashes. Anything in the square brackets "[]" can be matched.
+ This is using "0-9" as a shorthand expression to mean any digit one
+ through nine. It is the same as saying "0123456789". So any digit
+ matches. The "+" means one or more of the preceding expression must be
+ included. The preceding expression here is what is in the square
+ brackets -- in this case, any digit one through nine. Then, at the
+ end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
+ needs to match the expression on either side of that bar character
+ also. A simple "gif" on one side, and the other side will in turn
+ match either "jpeg" or "jpg", since the "?" means the letter "e" is
+ optional and can be matched once or not at all. So we are building an
+ expression here to match image GIF or JPEG type image file. It must
+ include the literal string "advert", then one or more digits, and a
+ "." (which is now a literal, and not a special character, since it is
+ escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
+ possible matches would include: "//advert1.jpg",
+ "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
+ would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
+ (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
+ not in the expression anywhere).
+
+ s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
+ will replace any occurrence of "microsoft". The "i" at the end of the
+ expression means ignore case. The "(?!.com)" means the match should
+ fail if "microsoft" is followed by ".com". In other words, this acts
+ like a "NOT" modifier. In case this is a hyperlink, we don't want to
+ break it ;-).
+
+ We are barely scratching the surface of regular expressions here so
+ that you can understand the default Junkbuster configuration files,
+ and maybe use this knowledge to customize your own installation. There
+ is much, much more that can be done with regular expressions. Now that
+ you know enough to get started, you can learn more on your own :/
+
+ More reading on Perl Compatible Regular expressions:
+ [70]http://www.perldoc.com/perl5.6/pod/perlre.html
+ _________________________________________________________________
+