J2. Peculiarities of PCRE Regular Expressions

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject.

The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in a special way.

There are two different sets of metacharacters: those recognized anywhere in a pattern except within square brackets, and those recognized in square brackets. Outside square brackets, the metacharacters are as follows:

Symbol

Value

\

general escape character with several uses

^

assert start of string (or line, in multiline mode)

$

assert end of string (or line, in multiline mode)

.

match any character except newline (by default)

[

start character class definition

]

end character class definition

|

start alternative branch

(

start subpattern

)

end subpattern

?

extends the meaning of (

also 0 or 1 quantifier

also quantifier minimizer

*

0 or more quantifier

+

1 or more quantifier

also "possessive quantifier"

{

start min/max quantifier

Part of a pattern that is in square brackets is called a "character class". In a character class the only metacharacters are:

Symbol

Value

\

general escape character

^

negate the class, but only if the first character

-

indicates character range

[

POSIX character class (only if followed by POSIX syntax)

]

terminates the character class