import os
# This trick ensures that we always use the same random seed,
# hence running this documentation always gives the same result.
os.environ['VCSN_SEED'] = '1'
The set of all strings beginning with 101 and ending with 01010.
First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.
import vcsn
c = vcsn.context('lal_char(01), b')
c
Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.
e = c.expression('(101[01]*)&([01]*01010)')
e
We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.
a = e.automaton()
a
and then we convert this automaton into a basic expression:
a.expression()
Or, in ASCII:
print(a.expression())
I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?
First, let's define that alphabet we work on: from $a$ to $z$ for instance.
import vcsn
c = vcsn.context('lal_char(a-z), b')
c
Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton
), from which we extract a basic expression (with _automaton_.expresion
).
e = c.expression('(hede){c}')
e
a = e.automaton()
a
a.expression()
Or, in ASCII (+
is usually denoted |
; \e
denotes the empty word; and [^]
denotes any character, usually written .
):
print(a.expression())
Is there a tool (or an algorithm) to convert a finite state machine into a regular expression?
Vcsn is tool for rational expressions and automata. See http://vcsn.lrde.epita.fr.
import vcsn
Build a random automaton $a$ of 4 states, labeled by letters to choose in ${a, b, c}$. Then "lift" it into an automaton $b$ labeled by expressions.
a = vcsn.context('lal(abc), b').random_automaton(4, num_final=2)
b = a.lift()
b
Eliminate state 2, then 3, etc. The order does not matter for correction (any order gives a correct result), but the quality of the result does depend on this order.
b = b.eliminate_state(2)
b
b = b.eliminate_state(3); b
b = b.eliminate_state(1); b
Eliminate the last state, and read the answer.
b = b.eliminate_state(0); b
Alternatively, you can use the method automaton.expression
to ask for the result.
a.expression()
You may see this algorithm run "interactively" using %demo
.
a = vcsn.context('lal(abc), b').random_automaton(10)
b = a.lift()
%demo b eliminate_state
I'm trying to construct a regular expression from a Finite Automaton but found my self completely stuck with this one.
import vcsn
%%automaton a
$ -> q3
q1 -> q2 a
q1 -> q3 b
q2 -> q2 a, b
q3 -> q4 a
q3 -> q3 b
q4 -> q4 a
q4 -> q1 b
q3 -> $
q4 -> $
a.expression()
(This is the question with typos in $L_1$ and $L_2$ fixed.)
If $L_1= \{a^n b^m \mid n \geqslant 1, m \geqslant 0 \} \cup \{ba\}$ and $L_2= \{b^m \mid m \geqslant 1 \}$. I am not getting how the DFA for $L_1/L_2$ is constructed in the second figure using the DFA for $L_1$, please tell me the approach.
First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{a, b\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.
import vcsn
c = vcsn.context('lal(ab), b')
c
Then define the first automaton, Figure 4.1.
%%automaton a1
$ -> q0
q1 -> $
q0 -> q1 a
q0 -> q3 b
q1 -> q1 a
q1 -> q2 b
q2 -> $
q2 -> q2 b
q2 -> q5 a
q3 -> q4 a
q3 -> q5 b
q4 -> $
q4 -> q5 a, b
q5 -> q5 a, b
a1.is_deterministic() and a1.is_complete()
Automaton $\mathcal{A}_2$ represents the language $L_2$:
a2 = c.expression('b{+}').automaton()
a2
We may now compute the quotient, which is indeed the automaton of Figure 4.2. It is better looking once trimmed.
a1 / a2
(a1 / a2).trim()