-Junkbuster can use "regular expressions" in various config files. Assuming
-support for "pcre" (Perl Compatible Regular Expressions) is compiled in, which
-is the default. Such configuration directives do not require regular
-expressions, but they can be used to increase flexibility by matching a pattern
-with wildcards against URLs.
-
-If you are reading this, you probably don't understand what "regular
-expressions" are, or what they can do. So this will be a very brief
-introduction only. A full explanation would require a book ;-)
-
-"Regular expressions" is a way of matching one character expression against
-another to see if it matches or not. One of the "expressions" is a literal
-string of readable characters (letter, numbers, etc), and the other is a
-complex string of literal characters combined with wildcards, and other special
-characters, called metacharacters. The "metacharacters" have special meanings
-and are used to build the complex pattern to be matched against. Perl
-Compatible Regular Expressions is an enhanced form of the regular expression
-language with backward compatibility.
-
-To make a simple analogy, we do something similar when we use wildcard
-characters when listing files with the dir command in DOS. *.* matches all
-filenames. The "special" character here is the asterik which matches any and
-all characters. We can be more specific and use ? to match just individual
-characters. So "dir file?.text" would match "file1.txt", "file2.txt", etc. We
-are pattern matching, using a similar technique to "regular expressions"!
-
-Regular expressions do essentially the same thing, but are much, much more
-powerful. There are many more "special characters" and ways of building complex
-patterns however. Let's look at a few of the common ones, and then some
-examples:
-
-. - Matches any single character, e.g. "a", "A", "4", ":", or "@".
-
-? - The preceding character or expression is matched ZERO or ONE times. Either/
-or.
-
-+ - The preceding character or expression is matched ONE or MORE times.
-
-* - The preceding character or expression is matched ZERO or MORE times.
-
-\ - The "escape" character denotes that the following character should be taken
-literally. This is used where one of the special characters (e.g. ".") needs to
-be taken literally and not as a special metacharacter.
-
-[] - Characters enclosed in brackets will be matched if any of the enclosed
-characters are encountered.
-
-() - Pararentheses are used to group a sub-expression, or multiple
-sub-expressions.
-
-| - The "bar" character works like an "or" conditional statement. A match is
-successful if the sub-expression on either side of "|" matches.
-
-s/string1/string2/g - This is used to rewrite strings of text. "string1" is
-replaced by "string2" in this example.
-
-These are just some of the ones you are likely to use when matching URLs with
-Junkbuster, and is a long way from a definitive list. This is enough to get us
-started with a few simple examples which may be more illuminating:
-
-/.*/banners/.* - A simple example that uses the common combination of "." and "
-*" to denote any character, zero or more times. In other words, any string at
-all. So we start with a literal forward slash, then our regular expression
-pattern (".*") another literal forward slash, the string "banners", another
-forward slash, and lastly another ".*". We are building a directory path here.
-This will match any file with the path that has a directory named "banners" in
-it. The ".*" matches any characters, and this could conceivably be more forward
-slashes, so it might expand into a much longer looking path. For example, this
-could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or just "/
-banners/annoying.html", or almost an infinite number of other possible
-combinations, just so it has "banners" in the path somewhere.
-
-A now something a little more complex:
-
-/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal forward
-slashes again ("/"), so we are building another expression that is a file path
-statement. We have another ".*", so we are matching against any conceivable
-sub-path, just so it matches our expression. The only true literal that must
-match our pattern is adv, together with the forward slashes. What comes after
-the "adv" string is the interesting part.
-
-Remember the "?" means the preceding expression (either a literal character or
-anything grouped with "(...)" in this case) can exist or not, since this means
-either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as
-are the individual sub-expressions: "(er)", "(ing|ements?)", and the "s". The "
-|" means "or". We have two of those. For instance, "(ing|ements?)", can expand
-to match either "ing" OR "ements?". What is being done here, is an attempt at
-matching as many variations of "advertisement", and similar, as possible. So
-this would expand to match just "adv", or "advert", or "adverts", or
-"advertising", or "advertisement", or "advertisements". You get the idea. But
-it would not match "advertizements" (with a "z"). We could fix that by changing
-our regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which
-would then match either spelling.
-
-/.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward
-slashes. Anything in the square brackets "[]" can be matched. This is using
-"0-9" as a shorthand expression to mean any digit one through nine. It is the
-same as saying "0123456789". So any digit matches. The "+" means one or more of
-the preceding expression must be included. The preceding expression here is
-what is in the square brackets -- in this case, any digit one through nine.
-Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so
-this needs to match the expression on either side of that bar character also. A
-simple "gif" on one side, and the other side will in turn match either "jpeg"
-or "jpg", since the "?" means the letter "e" is optional and can be matched
-once or not at all. So we are building an expression here to match image GIF or
-JPEG type image file. It must include the literal string "advert", then one or
-more digits, and a "." (which is now a literal, and not a special character,
-since it is escaped with "\"), and lastly either "gif", or "jpeg", or "jpg".
-Some possible matches would include: "//advert1.jpg", "/nasty/ads/
-advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match
-"advert1.gif" (no leading slash), or "/adverts232.jpg" (the expression does not
-include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere).
-
-s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck" will
-replace any occurence of "microsoft". The "i" at the end of the expression
-means ignore case. The "(?!.com)" means the match should fail if "microsoft" is
-followed by ".com". In other words, this acts like a "NOT" modifier. In case
-this is a hyperlink, we don't want to break it ;-).
-
-We are barely scratching the surface of regular expressions here so that you
-can understand the default Junkbuster configuration files, and maybe use this
-knowledge to customize your own installation. There is much, much more that can
-be done with regular expressions. Now that you know enough to get started, you
-can learn more on your own :/
-
-More reading on Perl Compatible Regular expressions: http://www.perldoc.com/
-perl5.6/pod/perlre.html
-
+ Junkbuster can use "regular expressions" in various config files.
+ Assuming support for "pcre" (Perl Compatible Regular Expressions) is
+ compiled in, which is the default. Such configuration directives do
+ not require regular expressions, but they can be used to increase
+ flexibility by matching a pattern with wildcards against URLs.
+
+ If you are reading this, you probably don't understand what "regular
+ expressions" are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+
+ "Regular expressions" is a way of matching one character expression
+ against another to see if it matches or not. One of the "expressions"
+ is a literal string of readable characters (letter, numbers, etc), and
+ the other is a complex string of literal characters combined with
+ wildcards, and other special characters, called metacharacters. The
+ "metacharacters" have special meanings and are used to build the
+ complex pattern to be matched against. Perl Compatible Regular
+ Expressions is an enhanced form of the regular expression language
+ with backward compatibility.
+
+ To make a simple analogy, we do something similar when we use wildcard
+ characters when listing files with the dir command in DOS. *.* matches
+ all filenames. The "special" character here is the asterik which
+ matches any and all characters. We can be more specific and use ? to
+ match just individual characters. So "dir file?.text" would match
+ "file1.txt", "file2.txt", etc. We are pattern matching, using a
+ similar technique to "regular expressions"!
+
+ Regular expressions do essentially the same thing, but are much, much
+ more powerful. There are many more "special characters" and ways of
+ building complex patterns however. Let's look at a few of the common
+ ones, and then some examples:
+
+ . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
+
+ ? - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+
+ + - The preceding character or expression is matched ONE or MORE
+ times.
+
+ * - The preceding character or expression is matched ZERO or MORE
+ times.
+
+ \ - The "escape" character denotes that the following character should
+ be taken literally. This is used where one of the special characters
+ (e.g. ".") needs to be taken literally and not as a special
+ metacharacter.
+
+ [] - Characters enclosed in brackets will be matched if any of the
+ enclosed characters are encountered.
+
+ () - Pararentheses are used to group a sub-expression, or multiple
+ sub-expressions.
+
+ | - The "bar" character works like an "or" conditional statement. A
+ match is successful if the sub-expression on either side of "|"
+ matches.
+
+ s/string1/string2/g - This is used to rewrite strings of text.
+ "string1" is replaced by "string2" in this example.
+
+ These are just some of the ones you are likely to use when matching
+ URLs with Junkbuster, and is a long way from a definitive list. This
+ is enough to get us started with a few simple examples which may be
+ more illuminating:
+
+ /.*/banners/.* - A simple example that uses the common combination of
+ "." and "*" to denote any character, zero or more times. In other
+ words, any string at all. So we start with a literal forward slash,
+ then our regular expression pattern (".*") another literal forward
+ slash, the string "banners", another forward slash, and lastly another
+ ".*". We are building a directory path here. This will match any file
+ with the path that has a directory named "banners" in it. The ".*"
+ matches any characters, and this could conceivably be more forward
+ slashes, so it might expand into a much longer looking path. For
+ example, this could match:
+ "/eye/hate/spammers/banners/annoy_me_please.gif", or just
+ "/banners/annoying.html", or almost an infinite number of other
+ possible combinations, just so it has "banners" in the path somewhere.
+
+ A now something a little more complex:
+
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
+ forward slashes again ("/"), so we are building another expression
+ that is a file path statement. We have another ".*", so we are
+ matching against any conceivable sub-path, just so it matches our
+ expression. The only true literal that must match our pattern is adv,
+ together with the forward slashes. What comes after the "adv" string
+ is the interesting part.
+
+ Remember the "?" means the preceding expression (either a literal
+ character or anything grouped with "(...)" in this case) can exist or
+ not, since this means either zero or one match. So
+ "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
+ sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
+ "or". We have two of those. For instance, "(ing|ements?)", can expand
+ to match either "ing" OR "ements?". What is being done here, is an
+ attempt at matching as many variations of "advertisement", and
+ similar, as possible. So this would expand to match just "adv", or
+ "advert", or "adverts", or "advertising", or "advertisement", or
+ "advertisements". You get the idea. But it would not match
+ "advertizements" (with a "z"). We could fix that by changing our
+ regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
+ which would then match either spelling.
+
+ /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
+ forward slashes. Anything in the square brackets "[]" can be matched.
+ This is using "0-9" as a shorthand expression to mean any digit one
+ through nine. It is the same as saying "0123456789". So any digit
+ matches. The "+" means one or more of the preceding expression must be
+ included. The preceding expression here is what is in the square
+ brackets -- in this case, any digit one through nine. Then, at the
+ end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
+ needs to match the expression on either side of that bar character
+ also. A simple "gif" on one side, and the other side will in turn
+ match either "jpeg" or "jpg", since the "?" means the letter "e" is
+ optional and can be matched once or not at all. So we are building an
+ expression here to match image GIF or JPEG type image file. It must
+ include the literal string "advert", then one or more digits, and a
+ "." (which is now a literal, and not a special character, since it is
+ escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
+ possible matches would include: "//advert1.jpg",
+ "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
+ would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
+ (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
+ not in the expression anywhere).
+
+ s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
+ will replace any occurence of "microsoft". The "i" at the end of the
+ expression means ignore case. The "(?!.com)" means the match should
+ fail if "microsoft" is followed by ".com". In other words, this acts
+ like a "NOT" modifier. In case this is a hyperlink, we don't want to
+ break it ;-).
+
+ We are barely scratching the surface of regular expressions here so
+ that you can understand the default Junkbuster configuration files,
+ and maybe use this knowledge to customize your own installation. There
+ is much, much more that can be done with regular expressions. Now that
+ you know enough to get started, you can learn more on your own :/
+
+ More reading on Perl Compatible Regular expressions:
+ [54]http://www.perldoc.com/perl5.6/pod/perlre.html
+
+References
+
+ 1. http://ijbswa.sourceforge.net/user-manual/
+ 2. mailto:ijbswa-developers@lists.sourceforge.net
+ 3. file://localhost/home/swa/sf/current/doc/source/tmp.html#INTRODUCTION
+ 4. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN27
+ 5. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION
+ 6. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SOURCE
+ 7. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-RH
+ 8. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SUSE
+ 9. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OS2
+ 10. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-WIN
+ 11. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OTHER
+ 12. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONFIGURATION
+ 13. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN158
+ 14. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
+ 15. file://localhost/home/swa/sf/current/doc/source/tmp.html#FILTERFILE
+ 16. file://localhost/home/swa/sf/current/doc/source/tmp.html#QUICKSTART
+ 17. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONTACT
+ 18. file://localhost/home/swa/sf/current/doc/source/tmp.html#COPYRIGHT
+ 19. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1174
+ 20. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1180
+ 21. file://localhost/home/swa/sf/current/doc/source/tmp.html#SEEALSO
+ 22. file://localhost/home/swa/sf/current/doc/source/tmp.html#APPENDIX
+ 23. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
+ 24. http://i.j.b/
+ 25. http://sourceforge.net/projects/ijbswa/
+ 26. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/
+ 27. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=emxrt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
+ 28. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&key=gnupack&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fapps
+ 29. http://www.gnu.org/
+ 30. http://i.j.b/
+ 31. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
+ 32. http://i.j.b/
+ 33. http://i.j.b/
+ 34. http://i.j.b/
+ 35. http://i.j.b/show-url-info
+ 36. http://i.j.b/
+ 37. http://www.perldoc.com/perl5.6/pod/perlre.html
+ 38. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
+ 39. http://i.j.b/
+ 40. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse
+ 41. http://sourceforge.net/mail/?group_id=11118
+ 42. http://sourceforge.net/tracker/?group_id=11118&atid=111118
+ 43. http://www.gnu.org/copyleft/gpl.html
+ 44. http://www.junkbusters.com/ht/en/ijbfaq.html
+ 45. http://www.waldherr.org/junkbuster/
+ 46. http://sourceforge.net/projects/ijbswa/
+ 47. http://sourceforge.net/projects/ijbswa
+ 48. http://ijbswa.sourceforge.net/
+ 49. http://i.j.b/
+ 50. http://www.junkbusters.com/ht/en/cookies.html
+ 51. http://www.waldherr.org/junkbuster/
+ 52. http://privacy.net/analyze/
+ 53. http://www.squid-cache.org/
+ 54. http://www.perldoc.com/perl5.6/pod/perlre.html