Simulating a regular expression

For other ways to use regular expressions in Grafstate, see the sections on∶

A regular expression is a string of finite length that encodes a regular language. The expression in a sequence of symbols from the input alphabet Σ together with the 3 regular operators. The regular operators are described below∶

Symbol	Grafstate code	Description
⋃	`\u`	The union of two regular languages L₁ ⋃ L₂ is a regular language.
·	`\.`	The concatenation of two regular languages L₁ ∘ L₂ is a regular language.
*	`*`	The Kleene closure of a regular language L* is a regular language.

The concatenation operator \. is optional. Adjacency may be used instead.

The following regular expressions are considered to be equal∶

(a ⋃ b)* · c

(a ⋃ b)*c

Below is the Grafstate code for the regular expressions above∶

(a \u b)* \. c

(a \u b)*c

The regular expression above describes the language of strings consisting of any combinations of a's and b's followd by a single c.

Syntax

A regular expression is checked against the following constraints. If a regular expression violates a constraint, then you will receive an error message.

1	The regular expression can only consist of the symbols in the input alphabet and `\e` in addition to the parentheses `(` and `)` , and the regular operators `\u` , `\.` and `*` .

2	Parentheses must be balanced.

3	There cannot be two adjacent operators.

4	No operator can follow a left parenthesis.

5	`\u` and `\.` cannot precede a right parenthesis.

Formally, ∅ is a valid regular expression. However, ∅ is not recognized as a Grafstate regular expression.

Simulating a regular expression

You can simulate a regular expression with a string using the :sim command with 2 arguments. The first argument is the regular expression with no spaces and the second argument is an input string. The input alphabet Σ is given as an argument introduced using the & symbol.

The following Grafstate code simulates the regular expression (a ⋃ b)*c with the string babc∶

Code
`:sim (a\ub)*c babc & S={a,b,c}`

Simulation
Input `babc`
Result MATCHED
Σ Input alphabet {a,b,c}

Regular expression∶ (a⋃b)*c

If an input string is not matched by a regular expression, then the Simulator will indicate how far the string can be read by the regular expression before matching fails.

The following Grafstate code simulates the regular expression (ab)* with the string aba∶

Code
`:sim (ab)* aba & S={a,b}`

Simulation
Input `aba`
Result NOT MATCHED
Σ Input alphabet {a,b}

Regular expression∶ (ab)*

Understanding regular expressions

The regular expression can be defined formally using the following recursive definition∶

Base steps

1	"∅" is a regular expression, and L("∅")=∅.

The regular expression ∅ represents the empty language.

2	"ε" is a regular expression, and L("ε")={ε}.

The regular expression ε represents the language containing only the empty string.

3	∀c∈Σ c is a regular expression, and L(c)={c}.

For each alphabet symbol c, the regular expression c represents the language containing only c.

Recursive steps

1	Let R₁ and R₂ be regular expressions. Then R₁ "⋃" R₂ is a regular expression, and L(R₁ "⋃" R₂)=L(R₁) ⋃ L(R₂).

2	Let R₁ and R₂ be regular expressions. Then R₁ "∘" R₂ is a regular expression, and L(R₁ "@" R₂) = L(R₁) ∘ L(R₂).

3	Let R be a regular expression. Then R"" is a regular expression, and L(R"") = L(R)*.

Let Σ = {a,b}. The regular expression (a ⋃ b)*a ⋃ ε represents the language

(({a} ⋃ {b})* ∘ {a}) ⋃ {ε}.