![]() |
Home | Libraries | People | FAQ | More |
The following features of xpressive have not yet been implemented, but are
planned for the near future:
The following features are planned for xpressive 2.0:
[.a.]
and
[=a=]
std::locale
Since many of xpressive's users are likely to be familiar with the Boost.Regex library, I would be remiss if
I failed to point out some important differences between xpressive and Boost.Regex. In particular:
xpressive::basic_regex<>
is a template on the iterator type, not the character type.
xpressive::basic_regex<>
cannot be constructed directly from a string; rather, you must use basic_regex::compile()
or regex_compiler<>
to build a regex object from a string.
xpressive::basic_regex<>
does not have an imbue()
member function; rather, the imbue()
member function is in the xpressive::regex_compiler<>
factory.
boost::basic_regex<>
has a subset of std::basic_string<>
's
members. xpressive::basic_regex<>
does not. The members lacking are: assign()
, operator[]()
, max_size()
, begin()
, end()
, size()
, compare()
, and operator=(std::basic_string<>)
.
boost::basic_regex<>
but do not exist in xpressive::basic_regex<>
are: set_expression()
,
get_allocator()
,
imbue()
,
getloc()
,
getflags()
,
and str()
.
xpressive::basic_regex<>
does not have a RegexTraits template parameter. Customization of regex
syntax and localization behavior will be controlled by regex_compiler<>
and a custom regex facet for std::locale
.
xpressive::basic_regex<>
and xpressive::match_results<>
do not have an Allocator template parameter. This is by design.
match_not_dot_null
and
match_not_dot_newline
have
moved from the match_flag_type
enum to the syntax_option_type
enum, and they have changed names to not_dot_null
and not_dot_newline
.
syntax_option_type
enumeration values are not supported: escape_in_lists
,
char_classes
, intervals
, limited_ops
,
newline_alt
, bk_plus_qm
, bk_braces
,
bk_parens
, bk_refs
, bk_vbar
,
use_except
, failbit
, literal
,
perlex
, basic
,
extended
, emacs
, awk
,
grep
,egrep
,
sed
, JavaScript
,
JScript
.
match_flag_type
enumeration values are not supported: match_not_bob
,
match_not_eob
, match_perl
, match_posix
,
and match_extra
.
Also, in the current implementation, the regex algorithms in xpressive will not detect pathological behavior and abort by throwing an exception. It is up to you to write efficient patterns that do not behave pathologically.
The performance of xpressive is competitive with Boost.Regex. I have run performance benchmarks comparing static xpressive, dynamic xpressive and Boost.Regex on two platforms: gcc (Cygwin) and Visual C++. The tests include short matches and long searches. For both platforms, xpressive comes off well on short matches and roughly on par with Boost.Regex on long searches.
<disclaimer> As with all benchmarks, the true test is how xpressive performs with your patterns, your input, and your platform, so if performance matters in your application, it's best to run your own tests. </disclaimer>
Below are the results of a performance comparison between:
Test Specifications
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
static xpressive | dynamic xpressive | Boost | Text | Expression |
---|---|---|---|---|
1(8.79e‑07s) | 1.08(9.54e‑07s) | 2.51(2.2e‑06s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
1.06(1.07e‑06s) | 1(1.01e‑06s) | 4.01(4.06e‑06s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
1(1.4e‑06s) | 1.13(1.58e‑06s) | 2.89(4.05e‑06s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(1.28e‑06s) | 1.16(1.49e‑06s) | 3.07(3.94e‑06s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(1.22e‑06s) | 1.2(1.46e‑06s) | 3.22(3.93e‑06s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1.04(8.64e‑07s) | 1(8.34e‑07s) | 2.5(2.09e‑06s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1.11(9.09e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1.12(9.38e‑07s) | 1(8.34e‑07s) | 2.5(2.08e‑06s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(7.9e‑07s) | 1.06(8.34e‑07s) | 2.49(1.96e‑06s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(8.19e‑07s) | 1.04(8.49e‑07s) | 2.4(1.97e‑06s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1.09(8.95e‑07s) | 1(8.19e‑07s) | 2.4(1.96e‑06s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1.11(8.79e‑07s) | 1(7.9e‑07s) | 2.57(2.03e‑06s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1.09(8.94e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
static xpressive | dynamic xpressive | Boost | Expression |
---|---|---|---|
1(0.0263s) | 1(0.0263s) | 1.78(0.0469s) | Twain |
1(0.0234s) | 1(0.0234s) | 1.79(0.042s) | Huck[[:alpha:]]+ |
1.84(1.26s) | 2.21(1.51s) | 1(0.687s) | [[:alpha:]]+ing |
1.09(0.192s) | 2(0.351s) | 1(0.176s) | ^[^
]*?Twain |
1.41(0.08s) | 1.21(0.0684s) | 1(0.0566s) | Tom|Sawyer|Huckleberry|Finn |
1.56(0.195s) | 1.12(0.141s) | 1(0.125s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
Below are the results of a performance comparison between:
Test Specifications
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
static xpressive | dynamic xpressive | Boost | Text | Expression |
---|---|---|---|---|
1(3.2e‑007s) | 1.37(4.4e‑007s) | 2.38(7.6e‑007s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
1(6.4e‑007s) | 1.12(7.15e‑007s) | 1.72(1.1e‑006s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
1(9.82e‑007s) | 1.3(1.28e‑006s) | 1.61(1.58e‑006s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(8.94e‑007s) | 1.3(1.16e‑006s) | 1.7(1.52e‑006s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(9.09e‑007s) | 1.28(1.16e‑006s) | 1.67(1.52e‑006s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(3.06e‑007s) | 1.07(3.28e‑007s) | 1.95(5.96e‑007s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(3.13e‑007s) | 1.09(3.42e‑007s) | 1.86(5.81e‑007s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(3.2e‑007s) | 1.09(3.5e‑007s) | 1.86(5.96e‑007s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(2.68e‑007s) | 1.22(3.28e‑007s) | 2(5.36e‑007s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(2.76e‑007s) | 1.16(3.2e‑007s) | 1.94(5.36e‑007s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(2.98e‑007s) | 1.03(3.06e‑007s) | 1.85(5.51e‑007s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1(3.2e‑007s) | 1.12(3.58e‑007s) | 1.81(5.81e‑007s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1(3.28e‑007s) | 1.11(3.65e‑007s) | 1.77(5.81e‑007s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
static xpressive | dynamic xpressive | Boost | Expression |
---|---|---|---|
1(0.019s) | 1(0.019s) | 2.98(0.0566s) | Twain |
1(0.0176s) | 1(0.0176s) | 3.17(0.0556s) | Huck[[:alpha:]]+ |
3.62(1.78s) | 3.97(1.95s) | 1(0.492s) | [[:alpha:]]+ing |
2.32(0.344s) | 3.06(0.453s) | 1(0.148s) | ^[^
]*?Twain |
1(0.0576s) | 1.05(0.0606s) | 1.15(0.0664s) | Tom|Sawyer|Huckleberry|Finn |
1.24(0.164s) | 1.44(0.191s) | 1(0.133s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
Copyright © 2003, 2004 Eric Niebler |