- * Revisions :
- * $Log: urlmatch.c,v $
- * Revision 1.44 2008/05/04 16:18:32 fabiankeil
- * Provide parse_http_url() with a third parameter to specify
- * whether or not URLs without protocol are acceptable.
- *
- * Revision 1.43 2008/05/04 13:30:55 fabiankeil
- * Streamline parse_http_url()'s prototype.
- *
- * Revision 1.42 2008/05/04 13:24:16 fabiankeil
- * If the method isn't CONNECT, reject URLs without protocol.
- *
- * Revision 1.41 2008/05/02 09:51:34 fabiankeil
- * In parse_http_url(), don't muck around with values
- * that are none of its business: require an initialized
- * http structure and never unset http->ssl.
- *
- * Revision 1.40 2008/04/23 16:12:28 fabiankeil
- * Free with freez().
- *
- * Revision 1.39 2008/04/22 16:27:42 fabiankeil
- * In parse_http_request(), remove a pointless
- * temporary variable and free the buffer earlier.
- *
- * Revision 1.38 2008/04/18 05:17:18 fabiankeil
- * Mark simplematch()'s parameters as immutable.
- *
- * Revision 1.37 2008/04/17 14:53:29 fabiankeil
- * Move simplematch() into urlmatch.c as it's only
- * used to match (old-school) domain patterns.
- *
- * Revision 1.36 2008/04/14 18:19:48 fabiankeil
- * Remove now-pointless cast in create_url_spec().
- *
- * Revision 1.35 2008/04/14 18:11:21 fabiankeil
- * The compiler might not notice it, but the buffer passed to
- * create_url_spec() is modified later on and thus shouldn't
- * be declared immutable.
- *
- * Revision 1.34 2008/04/13 13:32:07 fabiankeil
- * Factor URL pattern compilation out of create_url_spec().
- *
- * Revision 1.33 2008/04/12 14:03:13 fabiankeil
- * Remove an obvious comment and improve another one.
- *
- * Revision 1.32 2008/04/12 12:38:06 fabiankeil
- * Factor out duplicated code to compile host, path and tag patterns.
- *
- * Revision 1.31 2008/04/10 14:41:04 fabiankeil
- * Ditch url_spec's path member now that it's no longer used.
- *
- * Revision 1.30 2008/04/10 04:24:24 fabiankeil
- * Stop duplicating the plain text representation of the path regex
- * (and keeping the copy around). Once the regex is compiled it's no
- * longer useful.
- *
- * Revision 1.29 2008/04/10 04:17:56 fabiankeil
- * In url_match(), check the right member for NULL when determining
- * whether there's a path regex to execute. Looking for a plain-text
- * representation works as well, but it looks "interesting" and that
- * member will be removed soonish anyway.
- *
- * Revision 1.28 2008/04/08 16:07:39 fabiankeil
- * Make it harder to mistake url_match()'s
- * second parameter for an url_spec.
- *
- * Revision 1.27 2008/04/08 15:44:33 fabiankeil
- * Save a bit of memory (and a few cpu cycles) by not bothering to
- * compile slash-only path regexes that don't affect the result.
- *
- * Revision 1.26 2008/04/07 16:57:18 fabiankeil
- * - Use free_url_spec() more consistently.
- * - Let it reset url->dcount just in case.
- *
- * Revision 1.25 2008/04/06 15:18:38 fabiankeil
- * Oh well, rename the --enable-pcre-host-patterns option to
- * --enable-extended-host-patterns as it's not really PCRE syntax.
- *
- * Revision 1.24 2008/04/06 14:54:26 fabiankeil
- * Use PCRE syntax in host patterns when configured
- * with --enable-pcre-host-patterns.
- *
- * Revision 1.23 2008/04/05 12:19:20 fabiankeil
- * Factor compile_host_pattern() out of create_url_spec().
- *
- * Revision 1.22 2008/03/30 15:02:32 fabiankeil
- * SZitify unknown_method().
- *
- * Revision 1.21 2007/12/24 16:34:23 fabiankeil
- * Band-aid (and micro-optimization) that makes it less likely to run out of
- * stack space with overly-complex path patterns. Probably masks the problem
- * reported by Lee in #1856679. Hohoho.
- *
- * Revision 1.20 2007/09/02 15:31:20 fabiankeil
- * Move match_portlist() from filter.c to urlmatch.c.
- * It's used for url matching, not for filtering.
- *
- * Revision 1.19 2007/09/02 13:42:11 fabiankeil
- * - Allow port lists in url patterns.
- * - Ditch unused url_spec member pathlen.
- *
- * Revision 1.18 2007/07/30 16:42:21 fabiankeil
- * Move the method check into unknown_method()
- * and loop through the known methods instead
- * of using a screen-long OR chain.
- *
- * Revision 1.17 2007/04/15 16:39:21 fabiankeil
- * Introduce tags as alternative way to specify which
- * actions apply to a request. At the moment tags can be
- * created based on client and server headers.
- *
- * Revision 1.16 2007/02/13 13:59:24 fabiankeil
- * Remove redundant log message.
- *
- * Revision 1.15 2007/01/28 16:11:23 fabiankeil
- * Accept WebDAV methods for subversion
- * in parse_http_request(). Closes FR 1581425.
- *
- * Revision 1.14 2007/01/06 14:23:56 fabiankeil
- * Fix gcc43 warnings. Mark *csp as immutable
- * for parse_http_url() and url_match().
- * Replace a sprintf call with snprintf.
- *
- * Revision 1.13 2006/12/06 19:50:54 fabiankeil
- * parse_http_url() now handles intercepted
- * HTTP request lines as well. Moved parts
- * of parse_http_url()'s code into
- * init_domain_components() so that it can
- * be reused in chat().
- *
- * Revision 1.12 2006/07/18 14:48:47 david__schmidt
- * Reorganizing the repository: swapping out what was HEAD (the old 3.1 branch)
- * with what was really the latest development (the v_3_0_branch branch)
- *
- * Revision 1.10.2.7 2003/05/17 15:57:24 oes
- * - parse_http_url now checks memory allocation failure for
- * duplication of "*" URL and rejects "*something" URLs
- * Closes bug #736344
- * - Added a comment to what might look like a bug in
- * create_url_spec (see !bug #736931)
- * - Comment cosmetics
- *
- * Revision 1.10.2.6 2003/05/07 12:39:48 oes
- * Fix typo: Default port for https URLs is 443, not 143.
- * Thanks to Scott Tregear for spotting this one.
- *
- * Revision 1.10.2.5 2003/02/28 13:09:29 oes
- * Fixed a rare double free condition as per Bug #694713
- *
- * Revision 1.10.2.4 2003/02/28 12:57:44 oes
- * Moved freeing of http request structure to its owner
- * as per Dan Price's observations in Bug #694713
- *
- * Revision 1.10.2.3 2002/11/12 16:50:40 oes
- * Fixed memory leak in parse_http_request() reported by Oliver Stoeneberg. Fixes bug #637073
- *
- * Revision 1.10.2.2 2002/09/25 14:53:15 oes
- * Added basic support for OPTIONS and TRACE HTTP methods:
- * parse_http_url now recognizes the "*" URI as well as
- * the OPTIONS and TRACE method keywords.
- *
- * Revision 1.10.2.1 2002/06/06 19:06:44 jongfoster
- * Adding support for proprietary Microsoft WebDAV extensions
- *
- * Revision 1.10 2002/05/12 21:40:37 jongfoster
- * - Removing some unused code
- *
- * Revision 1.9 2002/04/04 00:36:36 gliptak
- * always use pcre for matching
- *
- * Revision 1.8 2002/04/03 23:32:47 jongfoster
- * Fixing memory leak on error
- *
- * Revision 1.7 2002/03/26 22:29:55 swa
- * we have a new homepage!
- *
- * Revision 1.6 2002/03/24 13:25:43 swa
- * name change related issues
- *
- * Revision 1.5 2002/03/13 00:27:05 jongfoster
- * Killing warnings
- *
- * Revision 1.4 2002/03/07 03:46:17 oes
- * Fixed compiler warnings
- *
- * Revision 1.3 2002/03/03 14:51:11 oes
- * Fixed CLF logging: Added ocmd member for client's request to struct http_request
- *
- * Revision 1.2 2002/01/21 00:14:09 jongfoster
- * Correcting comment style
- * Fixing an uninitialized memory bug in create_url_spec()
- *
- * Revision 1.1 2002/01/17 20:53:46 jongfoster
- * Moving all our URL and URL pattern parsing code to the same file - it
- * was scattered around in filters.c, loaders.c and parsers.c.
- *
- * Providing a single, simple url_match(pattern,url) function - rather than
- * the 3-line match routine which was repeated all over the place.
- *
- * Renaming free_url to free_url_spec, since it frees a struct url_spec.
- *
- * Providing parse_http_url() so that URLs can be parsed without faking a
- * HTTP request line for parse_http_request() or repeating the parsing
- * code (both of which were techniques that were actually in use).
- *
- * Standardizing that struct http_request is used to represent a URL, and
- * struct url_spec is used to represent a URL pattern. (Before, URLs were
- * represented as seperate variables and a partially-filled-in url_spec).
- *
- *