Scalable Simulation Framework
Regular Expressions

Regular expressions are a compact, formal way to specify character patterns in ASCII strings. For convenience, this page provides a brief summary of regular expression syntax, following the conventions adopted in the regexp package written by Henry Spencer that was included in Tcl releases prior to tcl8.0. There is also a very readable chapter on regular expressions in Brent B. Welch, Practical Programming in Tcl and Tk, 2nd ed., Prentice Hall 1997. See also the standard compiler textbooks such as Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Compilers Principles, Techniques and Tools, Addison-Wesley 1986.

Regular Expression Syntax

A regular expression (over the ASCII alphabet) is defined recursively as follows:

 

Examples of regular expressions

 abc%x  matches the string abc%x
 ..  matches all two-character strings
 ab*  matches strings empty, ab, abb, abbb,...
 (ab)*  matches strings empty, ab, abab, ababab,...
 a+  matches strings a, aa, aaa,...
 ab?  matches strings a or ab
 [Hh]ello  matches Hello or hello
 hello|Hello  matches Hello or hello
 .*  matches any string
 \.  matches a period "."
 ^(one|2|3)  matches a string beginning with "one" or "2" or "3"
 [1-9][0-9]*  matches any integer greater than zero
 [a-zA-Z0-9]+  matches any string containing one or more letters or digits only
 [^a-d]  matches any string that does not contain any of the letters a, b, c, d
 ^[a-zA-Z]$  matches a string of exactly one letter
 array\[N\]  matches the string "array[N]"
 ^[^\n]*\n  matches everything from the beginning of a string up to a newline
 [ \t\n]*  matches whitespace (spaces, tabs, newlines)
[^:]+://[^:/]+(:[0-9]+)?/.*  matches a URL, e.g., http://www.ssfnet.org:80/home/index.html

 

Things to know

A regular expression does not have to match the whole string. There can be unmatched characters before and after the match. To force the matching of the entire string, the regular expression must begin with "^" and terminate with "$" .

If a pattern can match several substrings of a string, take the earliest match in the string. Then, if there is more than one match from that point, take the longest match.


ato 28 March 1999