4 Reg Ex

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Regular Expressions

Definitions Equivalence to Finite Automata

REs: Introduction
Regular expressions are an algebraic way to describe languages. They describe exactly the regular languages. If E is a regular expression, then L(E) is the language it defines. Well describe REs and their languages recursively.
2

REs: Definition
Basis 1: If a is any symbol, then a is a RE, and L(a) = {a}. Basis 2: is a RE, and L() = {}. Basis 3: is a RE, and L() = .
Note: {a} is the language containing one string, and that string is of length 1.

REs: Definition (2)


Induction 1: If E1 and E2 are regular expressions, then E1+E2 is a regular expression, and L(E1+E2) = L(E1)L(E2). Induction 2: If E1 and E2 are regular expressions, then E1E2 is a regular expression, and L(E1E2) = L(E1)L(E2).
Concatenation : the set of strings wx such that w
Is in L(E1) and x is in L(E2).
4

REs: Definition (3)


Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))*.
Closure, or Kleene closure = set of strings

w1w2wn, for some n > 0, where each wi is in L(E). Note: when n=0, the string is .

Precedence of Operators
Parentheses may be used wherever needed to influence the grouping of operators. Order of precedence is * (highest), then concatenation, then + (lowest).

Examples: REs
L(01) = {01}. L(01+0) = {01, 0}. L(0(1+0)) = {01, 00}.
Note order of precedence of operators.

L(0*) = {, 0, 00, 000, }. L((0+10)*(+1)) = all strings of 0s and 1s without two consecutive 1s.
7

Equivalence of REs and Automata


We need to show that for every RE, there is an automaton that accepts the same language.
Pick the most powerful automaton type: the -NFA.

And we need to show that for every automaton, there is a RE defining its language.
Pick the most restrictive type: the DFA.
8

Converting a RE to an -NFA
Proof is an induction on the number of operators (+, concatenation, *) in the RE. We always construct an automaton of a special form (next slide).

Form of -NFAs Constructed

Start state: Only state with external predecessors

No arcs from outside, no arcs leaving

Final state: Only state with external successors

10

RE to -NFA: Basis
Symbol a: :
: a

11

RE to -NFA: Induction 1 Union


For E1

For E2

For E1

E2

12

RE to -NFA: Induction 2 Concatenation

For E1

For E2

For E1E2

13

RE to -NFA: Induction 3 Closure


For E

For E*
14

DFA-to-RE
A strange sort of induction. States of the DFA are assumed to be 1,2,,n. We construct REs for the labels of restricted sets of paths.
Basis: single arcs or no arc at all. Induction: paths that are allowed to traverse next state in order.
15

k-Paths
A k-path is a path through the graph of the DFA that goes though no state numbered higher than k. Endpoints are not restricted; they can be any state.

16

Example: k-Paths
1 1 0 1 0 3 2 0 1 0-paths from 2 to 3: RE for labels = 0. 1-paths from 2 to 3: RE for labels = 0+11. 2-paths from 2 to 3: RE for labels = (10)*0+1(01)*1 3-paths from 2 to 3: RE for labels = ??

17

k-Path Induction
Let Rijk be the regular expression for the set of labels of k-paths from state i to state j. Basis: k=0. Rij0 = sum of labels of arc from i to j.
if no such arc. But add if i=j.

18

Example: Basis
R120 = 0. R110 = + = .

1 0 1

0 3

2 0 1

19

k-Path Inductive Case


A k-path from i to j either:
1. Never goes through state k, or 2. Goes through k one or more times.

Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.


Doesnt go through k Goes from i to k the first time Zero or more times from k to k Then, from k to j

20

Illustration of Induction
Path to k i Paths not going through k k From k to k Several times j

States < k

From k to j

21

Final Step
The RE with the same language as the DFA is the sum (union) of Rijn, where:
1. n is the number of states; i.e., paths are unconstrained. 2. i is the start state. 3. j is one of the final states.

22

Example

1 0 1

0 3

2 0 1

R233 = R232 + R232(R332)*R332 = R232(R332)* R232 = (10)*0+1(01)*1 R332 = 0(01)*(1+00) + 1(10)*(0+11) R233 = [(10)*0+1(01)*1] [(0(01)*(1+00) + 1(10)*(0+11))]*
23

Summary
Each of the three types of automata (DFA, NFA, -NFA) we discussed, and regular expressions as well, define exactly the same set of languages: the regular languages.

24

Algebraic Laws for REs


Union and concatenation behave sort of like addition and multiplication.
+ is commutative and associative; concatenation is associative. Concatenation distributes over +. Exception: Concatenation is not commutative.

25

Identities and Annihilators

is the identity for +.

R +

= R.

is the identity for concatenation.


R = R = R.

is the annihilator for concatenation.


= R =
.

26

You might also like