2 pcreposix - POSIX API for Perl-compatible regular expres-
10 int regcomp(regex_t *preg, const char *pattern,
13 int regexec(regex_t *preg, const char *string,
14 size_t nmatch, regmatch_t pmatch[], int eflags);
16 size_t regerror(int errcode, const regex_t *preg,
17 char *errbuf, size_t errbuf_size);
19 void regfree(regex_t *preg);
24 This set of functions provides a POSIX-style API to the PCRE
25 regular expression package. See the pcre documentation for a
26 description of the native API, which contains additional
29 The functions described here are just wrapper functions that
30 ultimately call the native API. Their prototypes are defined
31 in the pcreposix.h header file, and on Unix systems the
32 library itself is called pcreposix.a, so can be accessed by
33 adding -lpcreposix to the command for linking an application
34 which uses them. Because the POSIX functions call the native
35 ones, it is also necessary to add -lpcre.
37 I have implemented only those option bits that can be rea-
38 sonably mapped to PCRE native options. In addition, the
39 options REG_EXTENDED and REG_NOSUB are defined with the
40 value zero. They have no effect, but since programs that are
41 written to the POSIX interface often use them, this makes it
42 easier to slot in PCRE as a replacement library. Other POSIX
43 options are not even defined.
45 When PCRE is called via these functions, it is only the API
46 that is POSIX-like in style. The syntax and semantics of the
47 regular expressions themselves are still those of Perl, sub-
48 ject to the setting of various PCRE options, as described
51 The header for these functions is supplied as pcreposix.h to
52 avoid any potential clash with other POSIX libraries. It
53 can, of course, be renamed or aliased as regex.h, which is
54 the "correct" name. It provides two structure types, regex_t
55 for compiled internal forms, and regmatch_t for returning
56 captured substrings. It also defines some constants whose
57 names start with "REG_"; these are used for setting options
58 and identifying error codes.
63 The function regcomp() is called to compile a pattern into
64 an internal form. The pattern is a C string terminated by a
65 binary zero, and is passed in the argument pattern. The preg
66 argument is a pointer to a regex_t structure which is used
67 as a base for storing information about the compiled expres-
70 The argument cflags is either zero, or contains one or more
71 of the bits defined by the following macros:
75 The PCRE_CASELESS option is set when the expression is
76 passed for compilation to the native function.
80 The PCRE_MULTILINE option is set when the expression is
81 passed for compilation to the native function.
83 In the absence of these flags, no options are passed to the
84 native function. This means the the regex is compiled with
85 PCRE default semantics. In particular, the way it handles
86 newline characters in the subject string is the Perl way,
87 not the POSIX way. Note that setting PCRE_MULTILINE has only
88 some of the effects specified for REG_NEWLINE. It does not
89 affect the way newlines are matched by . (they aren't) or a
90 negative class such as [^a] (they are).
92 The yield of regcomp() is zero on success, and non-zero oth-
93 erwise. The preg structure is filled in on success, and one
94 member of the structure is publicized: re_nsub contains the
95 number of capturing subpatterns in the regular expression.
96 Various error codes are defined in the header file.
101 The function regexec() is called to match a pre-compiled
102 pattern preg against a given string, which is terminated by
103 a zero byte, subject to the options in eflags. These can be:
107 The PCRE_NOTBOL option is set when calling the underlying
108 PCRE matching function.
112 The PCRE_NOTEOL option is set when calling the underlying
113 PCRE matching function.
115 The portion of the string that was matched, and also any
116 captured substrings, are returned via the pmatch argument,
117 which points to an array of nmatch structures of type
118 regmatch_t, containing the members rm_so and rm_eo. These
119 contain the offset to the first character of each substring
120 and the offset to the first character after the end of each
121 substring, respectively. The 0th element of the vector
122 relates to the entire portion of string that was matched;
123 subsequent elements relate to the capturing subpatterns of
124 the regular expression. Unused entries in the array have
125 both structure members set to -1.
127 A successful match yields a zero return; various error codes
128 are defined in the header file, of which REG_NOMATCH is the
129 "expected" failure code.
134 The regerror() function maps a non-zero errorcode from
135 either regcomp or regexec to a printable message. If preg is
136 not NULL, the error should have arisen from the use of that
137 structure. A message terminated by a binary zero is placed
138 in errbuf. The length of the message, including the zero, is
139 limited to errbuf_size. The yield of the function is the
140 size of buffer needed to hold the whole message.
145 Compiling a regular expression causes memory to be allocated
146 and associated with the preg structure. The function reg-
147 free() frees all such memory, after which preg may no longer
148 be used as a compiled expression.
153 Philip Hazel <ph10@cam.ac.uk>
154 University Computing Service,
156 Cambridge CB2 3QG, England.
157 Phone: +44 1223 334714
159 Copyright (c) 1997-2000 University of Cambridge.