This is regex.info, produced by makeinfo version 4.0 from regex.texi.

INFO-DIR-SECTION C library code
START-INFO-DIR-ENTRY
* Regex: (regex).                  Regular expression library.
END-INFO-DIR-ENTRY

  This file documents the GNU regular expression library.

  Copyright (C) 1992, 1993 Free Software Foundation, Inc.

  Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

  Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided also that the
section entitled "GNU General Public License" is included exactly as in
the original, and provided that the entire resulting derived work is
distributed under the terms of a permission notice identical to this
one.

  Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: regex.info,  Node: Top,  Next: Overview,  Prev: (dir),  Up: (dir)

Regular Expression Library
**************************

  This manual documents how to program with the GNU regular expression
library.  This is edition 0.12a of the manual, 19 September 1992.

  The first part of this master menu lists the major nodes in this Info
document, including the index.  The rest of the menu lists all the
lower level nodes in the document.

* Menu:

* Overview::
* Regular Expression Syntax::
* Common Operators::
* GNU Operators::
* GNU Emacs Operators::
* What Gets Matched?::
* Programming with Regex::
* Copying::                     Copying and sharing Regex.
* Index::                       General index.
 --- The Detailed Node Listing ---

Regular Expression Syntax

* Syntax Bits::
* Predefined Syntaxes::
* Collating Elements vs. Characters::
* The Backslash Character::

Common Operators

* Match-self Operator::                 Ordinary characters.
* Match-any-character Operator::        .
* Concatenation Operator::              Juxtaposition.
* Repetition Operators::                *  +  ? {}
* Alternation Operator::                |
* List Operators::                      [...]  [^...]
* Grouping Operators::                  (...)
* Back-reference Operator::             \digit
* Anchoring Operators::                 ^  $

Repetition Operators

* Match-zero-or-more Operator::  *
* Match-one-or-more Operator::   +
* Match-zero-or-one Operator::   ?
* Interval Operators::           {}

List Operators (`[' ... `]' and `[^' ... `]')

* Character Class Operators::   [:class:]
* Range Operator::          start-end

Anchoring Operators

* Match-beginning-of-line Operator::  ^
* Match-end-of-line Operator::        $

GNU Operators

* Word Operators::
* Buffer Operators::

Word Operators

* Non-Emacs Syntax Tables::
* Match-word-boundary Operator::        \b
* Match-within-word Operator::          \B
* Match-beginning-of-word Operator::    \<
* Match-end-of-word Operator::          \>
* Match-word-constituent Operator::     \w
* Match-non-word-constituent Operator:: \W

Buffer Operators

* Match-beginning-of-buffer Operator::  \`
* Match-end-of-buffer Operator::        \'

GNU Emacs Operators

* Syntactic Class Operators::

Syntactic Class Operators

* Emacs Syntax Tables::
* Match-syntactic-class Operator::      \sCLASS
* Match-not-syntactic-class Operator::  \SCLASS

Programming with Regex

* GNU Regex Functions::
* POSIX Regex Functions::
* BSD Regex Functions::

GNU Regex Functions

* GNU Pattern Buffers::         The re_pattern_buffer type.
* GNU Regular Expression Compiling::  re_compile_pattern ()
* GNU Matching::                re_match ()
* GNU Searching::               re_search ()
* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()
* Searching with Fastmaps::     re_compile_fastmap ()
* GNU Translate Tables::        The `translate' field.
* Using Registers::             The re_registers type and related fns.
* Freeing GNU Pattern Buffers::  regfree ()

POSIX Regex Functions

* POSIX Pattern Buffers::               The regex_t type.
* POSIX Regular Expression Compiling::  regcomp ()
* POSIX Matching::                      regexec ()
* Reporting Errors::                    regerror ()
* Using Byte Offsets::                  The regmatch_t type.
* Freeing POSIX Pattern Buffers::       regfree ()

BSD Regex Functions

* BSD Regular Expression Compiling::    re_comp ()
* BSD Searching::                       re_exec ()


File: regex.info,  Node: Overview,  Next: Regular Expression Syntax,  Prev: Top,  Up: Top

Overview
********

  A "regular expression" (or "regexp", or "pattern") is a text string
that describes some (mathematical) set of strings.  A regexp R
"matches" a string S if S is in the set of strings described by R.

  Using the Regex library, you can:

   * see if a string matches a specified pattern as a whole, and

   * search within a string for a substring matching a specified
     pattern.


  Some regular expressions match only one string, i.e., the set they
describe has only one member.  For example, the regular expression
`foo' matches the string `foo' and no others.  Other regular
expressions match more than one string, i.e., the set they describe has
more than one member.  For example, the regular expression `f*' matches
the set of strings made up of any number (including zero) of `f's.  As
you can see, some characters in regular expressions match themselves
(such as `f') and some don't (such as `*'); the ones that don't match
themselves instead let you specify patterns that describe many
different strings.

  To either match or search for a regular expression with the Regex
library functions, you must first compile it with a Regex pattern
compiling function.  A "compiled pattern" is a regular expression
converted to the internal format used by the library functions.  Once
you've compiled a pattern, you can use it for matching or searching any
number of times.

  The Regex library consists of two source files: `regex.h' and
`regex.c'.  Regex provides three groups of functions with which you can
operate on regular expressions.  One group--the GNU group--is more
powerful but not completely compatible with the other two, namely the
POSIX and Berkeley UNIX groups; its interface was designed specifically
for GNU.  The other groups have the same interfaces as do the regular
expression functions in POSIX and Berkeley UNIX.

  We wrote this chapter with programmers in mind, not users of
programs--such as Emacs--that use Regex.  We describe the Regex library
in its entirety, not how to write regular expressions that a particular
program understands.


File: regex.info,  Node: Regular Expression Syntax,  Next: Common Operators,  Prev: Overview,  Up: Top

Regular Expression Syntax
*************************

  "Characters" are things you can type.  "Operators" are things in a
regular expression that match one or more characters.  You compose
regular expressions from operators, which in turn you specify using one
or more characters.

  Most characters represent what we call the match-self operator, i.e.,
they match themselves; we call these characters "ordinary".  Other
characters represent either all or parts of fancier operators; e.g.,
`.' represents what we call the match-any-character operator (which, no
surprise, matches (almost) any character); we call these characters
"special".  Two different things determine what characters represent
what operators:

  1. the regular expression syntax your program has told the Regex
     library to recognize, and

  2. the context of the character in the regular expression.

  In the following sections, we describe these things in more detail.

* Menu:

* Syntax Bits::
* Predefined Syntaxes::
* Collating Elements vs. Characters::
* The Backslash Character::


File: regex.info,  Node: Syntax Bits,  Next: Predefined Syntaxes,  Up: Regular Expression Syntax

Syntax Bits
===========

  In any particular syntax for regular expressions, some characters are
always special, others are sometimes special, and others are never
special.  The particular syntax that Regex recognizes for a given
regular expression depends on the value in the `syntax' field of the
pattern buffer of that regular expression.

  You get a pattern buffer by compiling a regular expression.  *Note
GNU Pattern Buffers::, and *Note POSIX Pattern Buffers::, for more
information on pattern buffers.  *Note GNU Regular Expression
Compiling::, *Note POSIX Regular Expression Compiling::, and *Note BSD
Regular Expression Compiling::, for more information on compiling.

  Regex considers the value of the `syntax' field to be a collection of
bits; we refer to these bits as "syntax bits".  In most cases, they
affect what characters represent what operators.  We describe the
meanings of the operators to which we refer in *Note Common Operators::,
*Note GNU Operators::, and *Note GNU Emacs Operators::.

  For reference, here is the complete list of syntax bits, in
alphabetical order:

`RE_BACKSLASH_ESCAPE_IN_LISTS'
     If this bit is set, then `\' inside a list (*note List Operators::
     quotes (makes ordinary, if it's special) the following character;
     if this bit isn't set, then `\' is an ordinary character inside
     lists.  (*Note The Backslash Character::, for what `\' does
     outside of lists.)

`RE_BK_PLUS_QM'
     If this bit is set, then `\+' represents the match-one-or-more
     operator and `\?' represents the match-zero-or-more operator; if
     this bit isn't set, then `+' represents the match-one-or-more
     operator and `?' represents the match-zero-or-one operator.  This
     bit is irrelevant if `RE_LIMITED_OPS' is set.

`RE_CHAR_CLASSES'
     If this bit is set, then you can use character classes in lists;
     if this bit isn't set, then you can't.

`RE_CONTEXT_INDEP_ANCHORS'
     If this bit is set, then `^' and `$' are special anywhere outside
     a list; if this bit isn't set, then these characters are special
     only in certain contexts.  *Note Match-beginning-of-line
     Operator::, and *Note Match-end-of-line Operator::.

`RE_CONTEXT_INDEP_OPS'
     If this bit is set, then certain characters are special anywhere
     outside a list; if this bit isn't set, then those characters are
     special only in some contexts and are ordinary elsewhere.
     Specifically, if this bit isn't set then `*', and (if the syntax
     bit `RE_LIMITED_OPS' isn't set) `+' and `?' (or `\+' and `\?',
     depending on the syntax bit `RE_BK_PLUS_QM') represent repetition
     operators only if they're not first in a regular expression or
     just after an open-group or alternation operator.  The same holds
     for `{' (or `\{', depending on the syntax bit `RE_NO_BK_BRACES') if
     it is the beginning of a valid interval and the syntax bit
     `RE_INTERVALS' is set.

`RE_CONTEXT_INVALID_OPS'
     If this bit is set, then repetition and alternation operators
     can't be in certain positions within a regular expression.
     Specifically, the regular expression is invalid if it has:

        * a repetition operator first in the regular expression or just
          after a match-beginning-of-line, open-group, or alternation
          operator; or

        * an alternation operator first or last in the regular
          expression, just before a match-end-of-line operator, or just
          after an alternation or open-group operator.


     If this bit isn't set, then you can put the characters
     representing the repetition and alternation characters anywhere in
     a regular expression.  Whether or not they will in fact be
     operators in certain positions depends on other syntax bits.

`RE_DOT_NEWLINE'
     If this bit is set, then the match-any-character operator matches
     a newline; if this bit isn't set, then it doesn't.

`RE_DOT_NOT_NULL'
     If this bit is set, then the match-any-character operator doesn't
     match a null character; if this bit isn't set, then it does.

`RE_INTERVALS'
     If this bit is set, then Regex recognizes interval operators; if
     this bit isn't set, then it doesn't.

`RE_LIMITED_OPS'
     If this bit is set, then Regex doesn't recognize the
     match-one-or-more, match-zero-or-one or alternation operators; if
     this bit isn't set, then it does.

`RE_NEWLINE_ALT'
     If this bit is set, then newline represents the alternation
     operator; if this bit isn't set, then newline is ordinary.

`RE_NO_BK_BRACES'
     If this bit is set, then `{' represents the open-interval operator
     and `}' represents the close-interval operator; if this bit isn't
     set, then `\{' represents the open-interval operator and `\}'
     represents the close-interval operator.  This bit is relevant only
     if `RE_INTERVALS' is set.

`RE_NO_BK_PARENS'
     If this bit is set, then `(' represents the open-group operator and
     `)' represents the close-group operator; if this bit isn't set,
     then `\(' represents the open-group operator and `\)' represents
     the close-group operator.

`RE_NO_BK_REFS'
     If this bit is set, then Regex doesn't recognize `\'DIGIT as the
     back reference operator; if this bit isn't set, then it does.

`RE_NO_BK_VBAR'
     If this bit is set, then `|' represents the alternation operator;
     if this bit isn't set, then `\|' represents the alternation
     operator.  This bit is irrelevant if `RE_LIMITED_OPS' is set.

`RE_NO_EMPTY_RANGES'
     If this bit is set, then a regular expression with a range whose
     ending point collates lower than its starting point is invalid; if
     this bit isn't set, then Regex considers such a range to be empty.

`RE_UNMATCHED_RIGHT_PAREN_ORD'
     If this bit is set and the regular expression has no matching
     open-group operator, then Regex considers what would otherwise be
     a close-group operator (based on how `RE_NO_BK_PARENS' is set) to
     match `)'.


File: regex.info,  Node: Predefined Syntaxes,  Next: Collating Elements vs. Characters,  Prev: Syntax Bits,  Up: Regular Expression Syntax

Predefined Syntaxes
===================

  If you're programming with Regex, you can set a pattern buffer's
(*note GNU Pattern Buffers::, and *Note POSIX Pattern Buffers::)
`syntax' field either to an arbitrary combination of syntax bits (*note
Syntax Bits::) or else to the configurations defined by Regex.  These
configurations define the syntaxes used by certain programs--GNU Emacs,
POSIX Awk, traditional Awk, Grep, Egrep--in addition to syntaxes for
POSIX basic and extended regular expressions.

  The predefined syntaxes-taken directly from `regex.h'--are:

     #define RE_SYNTAX_EMACS 0
     
     #define RE_SYNTAX_AWK                                                   \
       (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL                       \
        | RE_NO_BK_PARENS            | RE_NO_BK_REFS                         \
        | RE_NO_BK_VBAR               | RE_NO_EMPTY_RANGES                   \
        | RE_UNMATCHED_RIGHT_PAREN_ORD)
     
     #define RE_SYNTAX_POSIX_AWK                                             \
       (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
     
     #define RE_SYNTAX_GREP                                                  \
       (RE_BK_PLUS_QM              | RE_CHAR_CLASSES                         \
        | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS                            \
        | RE_NEWLINE_ALT)
     
     #define RE_SYNTAX_EGREP                                                 \
       (RE_CHAR_CLASSES        | RE_CONTEXT_INDEP_ANCHORS                    \
        | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE                    \
        | RE_NEWLINE_ALT       | RE_NO_BK_PARENS                             \
        | RE_NO_BK_VBAR)
     
     #define RE_SYNTAX_POSIX_EGREP                                           \
       (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
     
     /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff.  */
     #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
     
     #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
     
     /* Syntax bits common to both basic and extended POSIX regex syntax.  */
     #define _RE_SYNTAX_POSIX_COMMON                                         \
       (RE_CHAR_CLASSES | RE_DOT_NEWLINE      | RE_DOT_NOT_NULL              \
        | RE_INTERVALS  | RE_NO_EMPTY_RANGES)
     
     #define RE_SYNTAX_POSIX_BASIC                                           \
       (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
     
     /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
        RE_LIMITED_OPS, i.e., \? \+ \| are not recognized.  Actually, this
        isn't minimal, since other operators, such as \`, aren't disabled.  */
     #define RE_SYNTAX_POSIX_MINIMAL_BASIC                                   \
       (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
     
     #define RE_SYNTAX_POSIX_EXTENDED                                        \
       (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS                   \
        | RE_CONTEXT_INDEP_OPS  | RE_NO_BK_BRACES                            \
        | RE_NO_BK_PARENS       | RE_NO_BK_VBAR                              \
        | RE_UNMATCHED_RIGHT_PAREN_ORD)
     
     /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
        replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added.  */
     #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED                                \
       (_RE_SYNTAX_POSIX_COMMON  | RE_CONTEXT_INDEP_ANCHORS                  \
        | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES                           \
        | RE_NO_BK_PARENS        | RE_NO_BK_REFS                             \
        | RE_NO_BK_VBAR          | RE_UNMATCHED_RIGHT_PAREN_ORD)


File: regex.info,  Node: Collating Elements vs. Characters,  Next: The Backslash Character,  Prev: Predefined Syntaxes,  Up: Regular Expression Syntax

Collating Elements vs. Characters
=================================

  POSIX generalizes the notion of a character to that of a collating
element.  It defines a "collating element" to be "a sequence of one or
more bytes defined in the current collating sequence as a unit of
collation."

  This generalizes the notion of a character in two ways.  First, a
single character can map into two or more collating elements.  For
example, the German "es-zet" collates as the collating element `s'
followed by another collating element `s'.  Second, two or more
characters can map into one collating element.  For example, the
Spanish `ll' collates after `l' and before `m'.

  Since POSIX's "collating element" preserves the essential idea of a
"character," we use the latter, more familiar, term in this document.


File: regex.info,  Node: The Backslash Character,  Prev: Collating Elements vs. Characters,  Up: Regular Expression Syntax

The Backslash Character
=======================

  The `\' character has one of four different meanings, depending on
the context in which you use it and what syntax bits are set (*note
Syntax Bits::).  It can: 1) stand for itself, 2) quote the next
character, 3) introduce an operator, or 4) do nothing.

  1. It stands for itself inside a list (*note List Operators::) if the
     syntax bit `RE_BACKSLASH_ESCAPE_IN_LISTS' is not set.  For
     example, `[\]' would match `\'.

  2. It quotes (makes ordinary, if it's special) the next character
     when you use it either:

        * outside a list,(1) or

        * inside a list and the syntax bit
          `RE_BACKSLASH_ESCAPE_IN_LISTS' is set.


  3. It introduces an operator when followed by certain ordinary
     characters--sometimes only when certain syntax bits are set.  See
     the cases `RE_BK_PLUS_QM', `RE_NO_BK_BRACES', `RE_NO_BK_VAR',
     `RE_NO_BK_PARENS', `RE_NO_BK_REF' in *Note Syntax Bits::.  Also:

        * `\b' represents the match-word-boundary operator (*note
          Match-word-boundary Operator::).

        * `\B' represents the match-within-word operator (*note
          Match-within-word Operator::).

        * `\<' represents the match-beginning-of-word operator
          (*note Match-beginning-of-word Operator::).

        * `\>' represents the match-end-of-word operator (*note
          Match-end-of-word Operator::).

        * `\w' represents the match-word-constituent operator (*note
          Match-word-constituent Operator::).

        * `\W' represents the match-non-word-constituent operator
          (*note Match-non-word-constituent Operator::).

        * `\`' represents the match-beginning-of-buffer operator and
          `\'' represents the match-end-of-buffer operator (*note
          Buffer Operators::).

        * If Regex was compiled with the C preprocessor symbol `emacs'
          defined, then `\sCLASS' represents the match-syntactic-class
          operator and `\SCLASS' represents the
          match-not-syntactic-class operator (*note Syntactic Class
          Operators::).


  4. In all other cases, Regex ignores `\'.  For example, `\n' matches
     `n'.


  ---------- Footnotes ----------

  (1) Sometimes you don't have to explicitly quote special characters
to make them ordinary.  For instance, most characters lose any special
meaning inside a list (*note List Operators::).  In addition, if the
syntax bits `RE_CONTEXT_INVALID_OPS' and `RE_CONTEXT_INDEP_OPS' aren't
set, then (for historical reasons) the matcher considers special
characters ordinary if they are in contexts where the operations they
represent make no sense; for example, then the match-zero-or-more
operator (represented by `*') matches itself in the regular expression
`*foo' because there is no preceding expression on which it can
operate.  It is poor practice, however, to depend on this behavior; if
you want a special character to be ordinary outside a list, it's better
to always quote it, regardless.


File: regex.info,  Node: Common Operators,  Next: GNU Operators,  Prev: Regular Expression Syntax,  Up: Top

Common Operators
****************

  You compose regular expressions from operators.  In the following
sections, we describe the regular expression operators specified by
POSIX; GNU also uses these.  Most operators have more than one
representation as characters.  *Note Regular Expression Syntax::, for
what characters represent what operators under what circumstances.

  For most operators that can be represented in two ways, one
representation is a single character and the other is that character
preceded by `\'.  For example, either `(' or `\(' represents the
open-group operator.  Which one does depends on the setting of a syntax
bit, in this case `RE_NO_BK_PARENS'.  Why is this so?  Historical
reasons dictate some of the varying representations, while POSIX
dictates others.

  Finally, almost all characters lose any special meaning inside a list
(*note List Operators::).

* Menu:

* Match-self Operator::                 Ordinary characters.
* Match-any-character Operator::        .
* Concatenation Operator::              Juxtaposition.
* Repetition Operators::                *  +  ? {}
* Alternation Operator::                |
* List Operators::                      [...]  [^...]
* Grouping Operators::                  (...)
* Back-reference Operator::             \digit
* Anchoring Operators::                 ^  $


File: regex.info,  Node: Match-self Operator,  Next: Match-any-character Operator,  Up: Common Operators

The Match-self Operator (ORDINARY CHARACTER)
============================================

  This operator matches the character itself.  All ordinary characters
(*note Regular Expression Syntax::) represent this operator.  For
example, `f' is always an ordinary character, so the regular expression
`f' matches only the string `f'.  In particular, it does _not_ match
the string `ff'.


File: regex.info,  Node: Match-any-character Operator,  Next: Concatenation Operator,  Prev: Match-self Operator,  Up: Common Operators

The Match-any-character Operator (`.')
======================================

  This operator matches any single printing or nonprinting character
except it won't match a:

newline
     if the syntax bit `RE_DOT_NEWLINE' isn't set.

null
     if the syntax bit `RE_DOT_NOT_NULL' is set.

  The `.' (period) character represents this operator.  For example,
`a.b' matches any three-character string beginning with `a' and ending
with `b'.


File: regex.info,  Node: Concatenation Operator,  Next: Repetition Operators,  Prev: Match-any-character Operator,  Up: Common Operators

The Concatenation Operator
==========================

  This operator concatenates two regular expressions A and B.  No
character represents this operator; you simply put B after A.  The
result is a regular expression that will match a string if A matches
its first part and B matches the rest.  For example, `xy' (two
match-self operators) matches `xy'.


File: regex.info,  Node: Repetition Operators,  Next: Alternation Operator,  Prev: Concatenation Operator,  Up: Common Operators

Repetition Operators
====================

  Repetition operators repeat the preceding regular expression a
specified number of times.

* Menu:

* Match-zero-or-more Operator::  *
* Match-one-or-more Operator::   +
* Match-zero-or-one Operator::   ?
* Interval Operators::           {}


File: regex.info,  Node: Match-zero-or-more Operator,  Next: Match-one-or-more Operator,  Up: Repetition Operators

The Match-zero-or-more Operator (`*')
-------------------------------------

  This operator repeats the smallest possible preceding regular
expression as many times as necessary (including zero) to match the
pattern.  `*' represents this operator.  For example, `o*' matches any
string made up of zero or more `o's.  Since this operator operates on
the smallest preceding regular expression, `fo*' has a repeating `o',
not a repeating `fo'.  So, `fo*' matches `f', `fo', `foo', and so on.

  Since the match-zero-or-more operator is a suffix operator, it may be
useless as such when no regular expression precedes it.  This is the
case when it:

   * is first in a regular expression, or

   * follows a match-beginning-of-line, open-group, or alternation
     operator.


Three different things can happen in these cases:

  1. If the syntax bit `RE_CONTEXT_INVALID_OPS' is set, then the
     regular expression is invalid.

  2. If `RE_CONTEXT_INVALID_OPS' isn't set, but `RE_CONTEXT_INDEP_OPS'
     is, then `*' represents the match-zero-or-more operator (which
     then operates on the empty string).

  3. Otherwise, `*' is ordinary.


  The matcher processes a match-zero-or-more operator by first matching
as many repetitions of the smallest preceding regular expression as it
can.  Then it continues to match the rest of the pattern.

  If it can't match the rest of the pattern, it backtracks (as many
times as necessary), each time discarding one of the matches until it
can either match the entire pattern or be certain that it cannot get a
match.  For example, when matching `ca*ar' against `caaar', the matcher
first matches all three `a's of the string with the `a*' of the regular
expression.  However, it cannot then match the final `ar' of the
regular expression against the final `r' of the string.  So it
backtracks, discarding the match of the last `a' in the string.  It can
then match the remaining `ar'.


File: regex.info,  Node: Match-one-or-more Operator,  Next: Match-zero-or-one Operator,  Prev: Match-zero-or-more Operator,  Up: Repetition Operators

The Match-one-or-more Operator (`+' or `\+')
--------------------------------------------

  If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
recognize this operator.  Otherwise, if the syntax bit `RE_BK_PLUS_QM'
isn't set, then `+' represents this operator; if it is, then `\+' does.

  This operator is similar to the match-zero-or-more operator except
that it repeats the preceding regular expression at least once; *note
Match-zero-or-more Operator::, for what it operates on, how some syntax
bits affect it, and how Regex backtracks to match it.

  For example, supposing that `+' represents the match-one-or-more
operator; then `ca+r' matches, e.g., `car' and `caaaar', but not `cr'.


File: regex.info,  Node: Match-zero-or-one Operator,  Next: Interval Operators,  Prev: Match-one-or-more Operator,  Up: Repetition Operators

The Match-zero-or-one Operator (`?' or `\?')
--------------------------------------------

  If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
recognize this operator.  Otherwise, if the syntax bit `RE_BK_PLUS_QM'
isn't set, then `?' represents this operator; if it is, then `\?' does.

  This operator is similar to the match-zero-or-more operator except
that it repeats the preceding regular expression once or not at all;
*note Match-zero-or-more Operator::, to see what it operates on, how
some syntax bits affect it, and how Regex backtracks to match it.

  For example, supposing that `?' represents the match-zero-or-one
operator; then `ca?r' matches both `car' and `cr', but nothing else.


File: regex.info,  Node: Interval Operators,  Prev: Match-zero-or-one Operator,  Up: Repetition Operators

Interval Operators (`{' ... `}' or `\{' ... `\}')
-------------------------------------------------

  If the syntax bit `RE_INTERVALS' is set, then Regex recognizes
"interval expressions".  They repeat the smallest possible preceding
regular expression a specified number of times.

  If the syntax bit `RE_NO_BK_BRACES' is set, `{' represents the
"open-interval operator" and `}' represents the "close-interval
operator" ; otherwise, `\{' and `\}' do.

  Specifically, supposing that `{' and `}' represent the open-interval
and close-interval operators; then:

`{COUNT}'
     matches exactly COUNT occurrences of the preceding regular
     expression.

`{MIN,}'
     matches MIN or more occurrences of the preceding regular
     expression.

`{MIN, MAX}'
     matches at least MIN but no more than MAX occurrences of the
     preceding regular expression.

  The interval expression (but not necessarily the regular expression
that contains it) is invalid if:

   * MIN is greater than MAX, or

   * any of COUNT, MIN, or MAX are outside the range zero to
     `RE_DUP_MAX' (which symbol `regex.h' defines).


  If the interval expression is invalid and the syntax bit
`RE_NO_BK_BRACES' is set, then Regex considers all the characters in
the would-be interval to be ordinary.  If that bit isn't set, then the
regular expression is invalid.

  If the interval expression is valid but there is no preceding regular
expression on which to operate, then if the syntax bit
`RE_CONTEXT_INVALID_OPS' is set, the regular expression is invalid.  If
that bit isn't set, then Regex considers all the characters--other than
backslashes, which it ignores--in the would-be interval to be ordinary.


File: regex.info,  Node: Alternation Operator,  Next: List Operators,  Prev: Repetition Operators,  Up: Common Operators

The Alternation Operator (`|' or `\|')
======================================

  If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
recognize this operator.  Otherwise, if the syntax bit `RE_NO_BK_VBAR'
is set, then `|' represents this operator; otherwise, `\|' does.

  Alternatives match one of a choice of regular expressions: if you put
the character(s) representing the alternation operator between any two
regular expressions A and B, the result matches the union of the
strings that A and B match.  For example, supposing that `|' is the
alternation operator, then `foo|bar|quux' would match any of `foo',
`bar' or `quux'.

  The alternation operator operates on the _largest_ possible
surrounding regular expressions.  (Put another way, it has the lowest
precedence of any regular expression operator.)  Thus, the only way you
can delimit its arguments is to use grouping.  For example, if `(' and
`)' are the open and close-group operators, then `fo(o|b)ar' would
match either `fooar' or `fobar'.  (`foo|bar' would match `foo' or
`bar'.)

  The matcher usually tries all combinations of alternatives so as to
match the longest possible string.  For example, when matching
`(fooq|foo)*(qbarquux|bar)' against `fooqbarquux', it cannot take, say,
the first ("depth-first") combination it could match, since then it
would be content to match just `fooqbar'.


File: regex.info,  Node: List Operators,  Next: Grouping Operators,  Prev: Alternation Operator,  Up: Common Operators

List Operators (`[' ... `]' and `[^' ... `]')
=============================================

  "Lists", also called "bracket expressions", are a set of one or more
items.  An "item" is a character, a character class expression, or a
range expression.  The syntax bits affect which kinds of items you can
put in a list.  We explain the last two items in subsections below.
Empty lists are invalid.

  A "matching list" matches a single character represented by one of
the list items.  You form a matching list by enclosing one or more items
within an "open-matching-list operator" (represented by `[') and a
"close-list operator" (represented by `]').

  For example, `[ab]' matches either `a' or `b'.  `[ad]*' matches the
empty string and any string composed of just `a's and `d's in any
order.  Regex considers invalid a regular expression with a `[' but no
matching `]'.

  "Nonmatching lists" are similar to matching lists except that they
match a single character _not_ represented by one of the list items.
You use an "open-nonmatching-list operator" (represented by `[^'(1))
instead of an open-matching-list operator to start a nonmatching list.

  For example, `[^ab]' matches any character except `a' or `b'.

  If the `posix_newline' field in the pattern buffer (*note GNU Pattern
Buffers:: is set, then nonmatching lists do not match a newline.

  Most characters lose any special meaning inside a list.  The special
characters inside a list follow.

`]'
     ends the list if it's not the first list item.  So, if you want to
     make the `]' character a list item, you must put it first.

`\'
     quotes the next character if the syntax bit
     `RE_BACKSLASH_ESCAPE_IN_LISTS' is set.

`[:'
     represents the open-character-class operator (*note Character
     Class Operators::) if the syntax bit `RE_CHAR_CLASSES' is set and
     what follows is a valid character class expression.

`:]'
     represents the close-character-class operator if the syntax bit
     `RE_CHAR_CLASSES' is set and what precedes it is an
     open-character-class operator followed by a valid character class
     name.

`-'
     represents the range operator (*note Range Operator::) if it's not
     first or last in a list or the ending point of a range.

All other characters are ordinary.  For example, `[.*]' matches `.' and
`*'.

* Menu:

* Character Class Operators::   [:class:]
* Range Operator::          start-end

  ---------- Footnotes ----------

  (1) Regex therefore doesn't consider the `^' to be the first
character in the list.  If you put a `^' character first in (what you
think is) a matching list, you'll turn it into a nonmatching list.


File: regex.info,  Node: Character Class Operators,  Next: Range Operator,  Up: List Operators

Character Class Operators (`[:' ... `:]')
-----------------------------------------

  If the syntax bit `RE_CHARACTER_CLASSES' is set, then Regex
recognizes character class expressions inside lists.  A "character
class expression" matches one character from a given class.  You form a
character class expression by putting a character class name between an
"open-character-class operator" (represented by `[:') and a
"close-character-class operator" (represented by `:]').  The character
class names and their meanings are:

`alnum'
     letters and digits

`alpha'
     letters

`blank'
     system-dependent; for GNU, a space or tab

`cntrl'
     control characters (in the ASCII encoding, code 0177 and codes
     less than 040)

`digit'
     digits

`graph'
     same as `print' except omits space

`lower'
     lowercase letters

`print'
     printable characters (in the ASCII encoding, space tilde--codes
     040 through 0176)

`punct'
     neither control nor alphanumeric characters

`space'
     space, carriage return, newline, vertical tab, and form feed

`upper'
     uppercase letters

`xdigit'
     hexadecimal digits: `0'-`9', `a'-`f', `A'-`F'

These correspond to the definitions in the C library's `<ctype.h>'
facility.  For example, `[:alpha:]' corresponds to the standard
facility `isalpha'.  Regex recognizes character class expressions only
inside of lists; so `[[:alpha:]]' matches any letter, but `[:alpha:]'
outside of a bracket expression and not followed by a repetition
operator matches just itself.


File: regex.info,  Node: Range Operator,  Prev: Character Class Operators,  Up: List Operators

The Range Operator (`-')
------------------------

  Regex recognizes "range expressions" inside a list. They represent
those characters that fall between two elements in the current
collating sequence.  You form a range expression by putting a "range
operator" between two characters.(1) `-' represents the range operator.
For example, `a-f' within a list represents all the characters from `a'
through `f' inclusively.

  If the syntax bit `RE_NO_EMPTY_RANGES' is set, then if the range's
ending point collates less than its starting point, the range (and the
regular expression containing it) is invalid.  For example, the regular
expression `[z-a]' would be invalid.  If this bit isn't set, then Regex
considers such a range to be empty.

  Since `-' represents the range operator, if you want to make a `-'
character itself a list item, you must do one of the following:

   * Put the `-' either first or last in the list.

   * Include a range whose starting point collates strictly lower than
     `-' and whose ending point collates equal or higher.  Unless a
     range is the first item in a list, a `-' can't be its starting
     point, but _can_ be its ending point.  That is because Regex
     considers `-' to be the range operator unless it is preceded by
     another `-'.  For example, in the ASCII encoding, `)', `*', `+',
     `,', `-', `.', and `/' are contiguous characters in the collating
     sequence.  You might think that `[)-+--/]' has two ranges: `)-+'
     and `--/'.  Rather, it has the ranges `)-+' and `+--', plus the
     character `/', so it matches, e.g., `,', not `.'.

   * Put a range whose starting point is `-' first in the list.


  For example, `[-a-z]' matches a lowercase letter or a hyphen (in
English, in ASCII).

  ---------- Footnotes ----------

  (1) You can't use a character class for the starting or ending point
of a range, since a character class is not a single character.


File: regex.info,  Node: Grouping Operators,  Next: Back-reference Operator,  Prev: List Operators,  Up: Common Operators

Grouping Operators (`(' ... `)' or `\(' ... `\)')
=================================================

  A "group", also known as a "subexpression", consists of an
"open-group operator", any number of other operators, and a
"close-group operator".  Regex treats this sequence as a unit, just as
mathematics and programming languages treat a parenthesized expression
as a unit.

  Therefore, using "groups", you can:

   * delimit the argument(s) to an alternation operator (*note
     Alternation Operator::) or a repetition operator (*note Repetition
     Operators::).

   * keep track of the indices of the substring that matched a given
     group.  *Note Using Registers::, for a precise explanation.  This
     lets you:

        * use the back-reference operator (*note Back-reference
          Operator::).

        * use registers (*note Using Registers::).


  If the syntax bit `RE_NO_BK_PARENS' is set, then `(' represents the
open-group operator and `)' represents the close-group operator;
otherwise, `\(' and `\)' do.

  If the syntax bit `RE_UNMATCHED_RIGHT_PAREN_ORD' is set and a
close-group operator has no matching open-group operator, then Regex
considers it to match `)'.


File: regex.info,  Node: Back-reference Operator,  Next: Anchoring Operators,  Prev: Grouping Operators,  Up: Common Operators

The Back-reference Operator ("\"DIGIT)
======================================

  If the syntax bit `RE_NO_BK_REF' isn't set, then Regex recognizes
back references.  A back reference matches a specified preceding group.
The back reference operator is represented by `\DIGIT' anywhere after
the end of a regular expression's DIGIT-th group (*note Grouping
Operators::).

  DIGIT must be between `1' and `9'.  The matcher assigns numbers 1
through 9 to the first nine groups it encounters.  By using one of `\1'
through `\9' after the corresponding group's close-group operator, you
can match a substring identical to the one that the group does.

  Back references match according to the following (in all examples
below, `(' represents the open-group, `)' the close-group, `{' the
open-interval and `}' the close-interval operator):

   * If the group matches a substring, the back reference matches an
     identical substring.  For example, `(a)\1' matches `aa' and
     `(bana)na\1bo\1' matches `bananabanabobana'.  Likewise, `(.*)\1'
     matches any (newline-free if the syntax bit `RE_DOT_NEWLINE' isn't
     set) string that is composed of two identical halves; the `(.*)'
     matches the first half and the `\1' matches the second half.

   * If the group matches more than once (as it might if followed by,
     e.g., a repetition operator), then the back reference matches the
     substring the group _last_ matched.  For example, `((a*)b)*\1\2'
     matches `aabababa'; first group 1 (the outer one) matches `aab'
     and group 2 (the inner one) matches `aa'.  Then group 1 matches
     `ab' and group 2 matches `a'.  So, `\1' matches `ab' and `\2'
     matches `a'.

   * If the group doesn't participate in a match, i.e., it is part of an
     alternative not taken or a repetition operator allows zero
     repetitions of it, then the back reference makes the whole match
     fail.  For example, `(one()|two())-and-(three\2|four\3)' matches
     `one-and-three' and `two-and-four', but not `one-and-four' or
     `two-and-three'.  For example, if the pattern matches `one-and-',
     then its group 2 matches the empty string and its group 3 doesn't
     participate in the match.  So, if it then matches `four', then
     when it tries to back reference group 3--which it will attempt to
     do because `\3' follows the `four'--the match will fail because
     group 3 didn't participate in the match.


  You can use a back reference as an argument to a repetition operator.
For example, `(a(b))\2*' matches `a' followed by two or more `b's.
Similarly, `(a(b))\2{3}' matches `abbbb'.

  If there is no preceding DIGIT-th subexpression, the regular
expression is invalid.


File: regex.info,  Node: Anchoring Operators,  Prev: Back-reference Operator,  Up: Common Operators

Anchoring Operators
===================

  These operators can constrain a pattern to match only at the
beginning or end of the entire string or at the beginning or end of a
line.

* Menu:

* Match-beginning-of-line Operator::  ^
* Match-end-of-line Operator::        $


File: regex.info,  Node: Match-beginning-of-line Operator,  Next: Match-end-of-line Operator,  Up: Anchoring Operators

The Match-beginning-of-line Operator (`^')
------------------------------------------

  This operator can match the empty string either at the beginning of
the string or after a newline character.  Thus, it is said to "anchor"
the pattern to the beginning of a line.

  In the cases following, `^' represents this operator.  (Otherwise,
`^' is ordinary.)

   * It (the `^') is first in the pattern, as in `^foo'.

   * The syntax bit `RE_CONTEXT_INDEP_ANCHORS' is set, and it is outside
     a bracket expression.

   * It follows an open-group or alternation operator, as in `a\(^b\)'
     and `a\|^b'.  *Note Grouping Operators::, and *Note Alternation
     Operator::.


  These rules imply that some valid patterns containing `^' cannot be
matched; for example, `foo^bar' if `RE_CONTEXT_INDEP_ANCHORS' is set.

  If the `not_bol' field is set in the pattern buffer (*note GNU
Pattern Buffers::), then `^' fails to match at the beginning of the
string.  *Note POSIX Matching::, for when you might find this useful.

  If the `newline_anchor' field is set in the pattern buffer, then `^'
fails to match after a newline.  This is useful when you do not regard
the string to be matched as broken into lines.


File: regex.info,  Node: Match-end-of-line Operator,  Prev: Match-beginning-of-line Operator,  Up: Anchoring Operators

The Match-end-of-line Operator (`$')
------------------------------------

  This operator can match the empty string either at the end of the
string or before a newline character in the string.  Thus, it is said
to "anchor" the pattern to the end of a line.

  It is always represented by `$'.  For example, `foo$' usually
matches, e.g., `foo' and, e.g., the first three characters of
`foo\nbar'.

  Its interaction with the syntax bits and pattern buffer fields is
exactly the dual of `^''s; see the previous section.  (That is,
"beginning" becomes "end", "next" becomes "previous", and "after"
becomes "before".)


File: regex.info,  Node: GNU Operators,  Next: GNU Emacs Operators,  Prev: Common Operators,  Up: Top

GNU Operators
*************

  Following are operators that GNU defines (and POSIX doesn't).

* Menu:

* Word Operators::
* Buffer Operators::


File: regex.info,  Node: Word Operators,  Next: Buffer Operators,  Up: GNU Operators

Word Operators
==============

  The operators in this section require Regex to recognize parts of
words.  Regex uses a syntax table to determine whether or not a
character is part of a word, i.e., whether or not it is
"word-constituent".

* Menu:

* Non-Emacs Syntax Tables::
* Match-word-boundary Operator::        \b
* Match-within-word Operator::          \B
* Match-beginning-of-word Operator::    \<
* Match-end-of-word Operator::          \>
* Match-word-constituent Operator::     \w
* Match-non-word-constituent Operator:: \W


File: regex.info,  Node: Non-Emacs Syntax Tables,  Next: Match-word-boundary Operator,  Up: Word Operators

Non-Emacs Syntax Tables
-----------------------

  A "syntax table" is an array indexed by the characters in your
character set.  In the ASCII encoding, therefore, a syntax table has
256 elements.  Regex always uses a `char *' variable `re_syntax_table'
as its syntax table.  In some cases, it initializes this variable and
in others it expects you to initialize it.

   * If Regex is compiled with the preprocessor symbols `emacs' and
     `SYNTAX_TABLE' both undefined, then Regex allocates
     `re_syntax_table' and initializes an element I either to `Sword'
     (which it defines) if I is a letter, number, or `_', or to zero if
     it's not.

   * If Regex is compiled with `emacs' undefined but `SYNTAX_TABLE'
     defined, then Regex expects you to define a `char *' variable
     `re_syntax_table' to be a valid syntax table.

   * *Note Emacs Syntax Tables::, for what happens when Regex is
     compiled with the preprocessor symbol `emacs' defined.



File: regex.info,  Node: Match-word-boundary Operator,  Next: Match-within-word Operator,  Prev: Non-Emacs Syntax Tables,  Up: Word Operators

The Match-word-boundary Operator (`\b')
---------------------------------------

  This operator (represented by `\b') matches the empty string at
either the beginning or the end of a word.  For example, `\brat\b'
matches the separate word `rat'.


File: regex.info,  Node: Match-within-word Operator,  Next: Match-beginning-of-word Operator,  Prev: Match-word-boundary Operator,  Up: Word Operators

The Match-within-word Operator (`\B')
-------------------------------------

  This operator (represented by `\B') matches the empty string within a
word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat'
doesn't match `dirty rat'.


File: regex.info,  Node: Match-beginning-of-word Operator,  Next: Match-end-of-word Operator,  Prev: Match-within-word Operator,  Up: Word Operators

The Match-beginning-of-word Operator (`\<')
-------------------------------------------

  This operator (represented by `\<') matches the empty string at the
beginning of a word.


File: regex.info,  Node: Match-end-of-word Operator,  Next: Match-word-constituent Operator,  Prev: Match-beginning-of-word Operator,  Up: Word Operators

The Match-end-of-word Operator (`\>')
-------------------------------------

  This operator (represented by `\>') matches the empty string at the
end of a word.


File: regex.info,  Node: Match-word-constituent Operator,  Next: Match-non-word-constituent Operator,  Prev: Match-end-of-word Operator,  Up: Word Operators

The Match-word-constituent Operator (`\w')
------------------------------------------

  This operator (represented by `\w') matches any word-constituent
character.


File: regex.info,  Node: Match-non-word-constituent Operator,  Prev: Match-word-constituent Operator,  Up: Word Operators

The Match-non-word-constituent Operator (`\W')
----------------------------------------------

  This operator (represented by `\W') matches any character that is not
word-constituent.


File: regex.info,  Node: Buffer Operators,  Prev: Word Operators,  Up: GNU Operators

Buffer Operators
================

  Following are operators which work on buffers.  In Emacs, a "buffer"
is, naturally, an Emacs buffer.  For other programs, Regex considers the
entire string to be matched as the buffer.

* Menu:

* Match-beginning-of-buffer Operator::  \`
* Match-end-of-buffer Operator::        \'


File: regex.info,  Node: Match-beginning-of-buffer Operator,  Next: Match-end-of-buffer Operator,  Up: Buffer Operators

The Match-beginning-of-buffer Operator
--------------------------------------

  This operator (represented by `\`') matches the empty string at the
beginning of the buffer.


File: regex.info,  Node: Match-end-of-buffer Operator,  Prev: Match-beginning-of-buffer Operator,  Up: Buffer Operators

The Match-end-of-buffer Operator
--------------------------------

  This operator (represented by `\'') matches the empty string at the
end of the buffer.


File: regex.info,  Node: GNU Emacs Operators,  Next: What Gets Matched?,  Prev: GNU Operators,  Up: Top

GNU Emacs Operators
*******************

  Following are operators that GNU defines (and POSIX doesn't) that you
can use only when Regex is compiled with the preprocessor symbol
`emacs' defined.

* Menu:

* Syntactic Class Operators::