Scalable Simulation Framework
 Regular Expressions

Regular expressions are a compact, formal way to specify character patterns in ASCII strings. For convenience, this page provides a brief summary of regular expression syntax, following the conventions adopted in the regexp package written by Henry Spencer that was included in Tcl releases prior to tcl8.0. There is also a very readable chapter on regular expressions in Brent B. Welch, Practical Programming in Tcl and Tk, 2nd ed., Prentice Hall 1997. See also the standard compiler textbooks such as Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Compilers Principles, Techniques and Tools, Addison-Wesley 1986.

### Regular Expression Syntax

A regular expression (over the ASCII alphabet) is defined recursively as follows:

• A literal character or an empty set is a regular expression.
• Repetition operators:
• (r)* is a regular expression denoting zero or more repetitions of r
• (r)+ is a regular expression denoting one or more repetitions of r
• (r)? is a regular expression denoting zero or one repetions of r
• Concatenation: (r)(s)
• Alternation operator: (r) | (s) is a regular expression denoting (r) OR (s)
• Repetition, Concatenation and Alternation operators are all left-associative, with the following precedence rules, in the order from the highest to lowest:
• grouping in parentheses ()
• unary repetition operators *, +, ?
• concatenation
• alternation
• Special matching characters (can be escaped by preceding with a backslash \)
• . (a period) matches any character
• ^(r) matches the regular expression at the beginning of a string, must be first
• (r)\$ matches the regular expression at the end of a string, must be last
• Matching from a set of characters:
• [x-y] matches a character from a range over the ASCII ordered character set between x and y, inclusive
• [xyz] matches a character from the character set {x, y, z}, equivalent to (x|y|z)
• [^xyz] matches a character not in the character set {x, y, z}

### Examples of regular expressions

 abc%x matches the string abc%x .. matches all two-character strings ab* matches strings empty, ab, abb, abbb,... (ab)* matches strings empty, ab, abab, ababab,... a+ matches strings a, aa, aaa,... ab? matches strings a or ab [Hh]ello matches Hello or hello hello|Hello matches Hello or hello .* matches any string \. matches a period "." ^(one|2|3) matches a string beginning with "one" or "2" or "3" [1-9][0-9]* matches any integer greater than zero [a-zA-Z0-9]+ matches any string containing one or more letters or digits only [^a-d] matches any string that does not contain any of the letters a, b, c, d ^[a-zA-Z]\$ matches a string of exactly one letter array\[N\] matches the string "array[N]" ^[^\n]*\n matches everything from the beginning of a string up to a newline [ \t\n]* matches whitespace (spaces, tabs, newlines) [^:]+://[^:/]+(:[0-9]+)?/.* matches a URL, e.g., http://www.ssfnet.org:80/home/index.html

` `

### Things to know

A regular expression does not have to match the whole string. There can be unmatched characters before and after the match. To force the matching of the entire string, the regular expression must begin with "^" and terminate with "\$" .

If a pattern can match several substrings of a string, take the earliest match in the string. Then, if there is more than one match from that point, take the longest match.

ato 28 March 1999