Simulating a regular expression

For other ways to use regular expressions in Grafstate, see the sections on∶

A regular expression is a string of finite length that encodes a regular language. The expression in a sequence of symbols from the input alphabet Σ together with the 3 regular operators. The regular operators are described below∶

Symbol
Grafstate code
Description
\u
The union of two regular languages L1 ⋃ L2 is a regular language.
·
\.
The concatenation of two regular languages L1 ∘ L2 is a regular language.
*
*
The Kleene closure of a regular language L* is a regular language.

The concatenation operator \. is optional. Adjacency may be used instead.

The following regular expressions are considered to be equal∶
(a ⋃ b)* · c
(a ⋃ b)*c

Below is the Grafstate code for the regular expressions above∶
(a \u b)* \. c
(a \u b)*c

The regular expression above describes the language of strings consisting of any combinations of a's and b's followd by a single c.

Syntax

A regular expression is checked against the following constraints. If a regular expression violates a constraint, then you will receive an error message.

1The regular expression can only consist of the symbols in the input alphabet and \e in addition to the parentheses ( and ) , and the regular operators \u , \. and * .
2Parentheses must be balanced.
3There cannot be two adjacent operators.
4No operator can follow a left parenthesis.
5 \u and \. cannot precede a right parenthesis.

Formally, ∅ is a valid regular expression. However, ∅ is not recognized as a Grafstate regular expression.

Simulating a regular expression

You can simulate a regular expression with a string using the :sim command with 2 arguments. The first argument is the regular expression with no spaces and the second argument is an input string. The input alphabet Σ is given as an argument introduced using the & symbol.

The following Grafstate code simulates the regular expression (a ⋃ b)*c with the string babc

Code
:sim (a\ub)*c babc & S={a,b,c}

Simulation
Input babc
Result MATCHED
Σ Input alphabet {a,b,c}
Regular expression∶ (a⋃b)*c

If an input string is not matched by a regular expression, then the Simulator will indicate how far the string can be read by the regular expression before matching fails.

The following Grafstate code simulates the regular expression (ab)* with the string aba∶

Code
:sim (ab)* aba & S={a,b}

Simulation
Input aba
Result NOT MATCHED
Σ Input alphabet {a,b}
Regular expression∶ (ab)*

Understanding regular expressions

The regular expression can be defined formally using the following recursive definition∶

Base steps
1"∅" is a regular expression, and L("∅")=∅.
The regular expression ∅ represents the empty language.
2"ε" is a regular expression, and L("ε")={ε}.
The regular expression ε represents the language containing only the empty string.
3∀c∈Σ c is a regular expression, and L(c)={c}.
For each alphabet symbol c, the regular expression c represents the language containing only c.

Recursive steps
1Let R1 and R2 be regular expressions. Then R1 "⋃" R2 is a regular expression, and L(R1 "⋃" R2)=L(R1) ⋃ L(R2).
2Let R1 and R2 be regular expressions. Then R1 "∘" R2 is a regular expression, and L(R1 "@" R2) = L(R1) ∘ L(R2).
3Let R be a regular expression. Then R"*" is a regular expression, and L(R"*") = L(R)*.

Let Σ = {a,b}. The regular expression (a ⋃ b)*a ⋃ ε represents the language
(({a} ⋃ {b})* ∘ {a}) ⋃ {ε}.