Difference between revisions of "FSAs and Regular Expressions"

From ACSL Category Descriptions
Jump to navigation Jump to search
Line 150: Line 150:
= Video Resources =
= Video Resources =

The following YouTube videos show ACSL students and advisors working out some ACSL problems that have appeared in previous contests. Some of the videos contain ads; ACSL is not responsible for the ads and does not receive compensation in any form for those ads.  
Nice two-part video showing the relationship between FSAs and REs.  
| <youtube width="300" height="180">https://youtu.be/GwsU2LPs85U</youtube>
| [https://youtu.be/GwsU2LPs85U ''1 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')]

| <youtube width="300" height="180">https://youtu.be/shN_kHBFOUE</youtube>
| [https://youtu.be/shN_kHBFOUE ''2 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')]

This video uses the symbol "+" to mean "1 or more matches of the previous term". For example, "ab+" is the same as "abb*".

Line 164: Line 173:
1 - Convert Regular Expression to Finite-State Automaton
This video shows you how to convert a simple regular expression to a finite-state automaton (FSA). Finite-state automatons are also called finite-state machines.
Barry Brown
2 - Convert Regular Expression to Finite-State Automaton
The second in a series of videos about the relationship between regular expressions and finite-state automatons (or finite-state machines). This video goes over two more complex examples.
Barry Brown
Online Regular Expression Testers
Online regex tester and debugger: PHP, PCRE, Python, Golang and ...
Online regex tester, debugger with highlighting for PHP, PCRE, Python, Golang and JavaScript. ... Regular Expression. No Match. /. /. gm. Test String. Switch to ...
RegExr: Learn, Build, & Test RegEx
Regular expression tester with syntax highlighting, PHP / PCRE & JS Support, contextual help, cheat sheet, reference, and searchable community patterns.
Free Online Regular Expression Tester - FreeFormatter.com
Free online regular expression tester with cheatsheet and most common solutions to common problems.
Regex Tester and Debugger Online - Javascript, PCRE, PHP
Regular Expression Tester with highlighting for Javascript and PCRE. Quickly test and debug your regex.
Rubular: a Ruby regular expression editor and tester
Rubular is a Ruby-based regular expression editor and tester. It's a handy way to test regular expressions as you write them. Rubular is an especially good fit for ...
Regex Tester - Javascript, PCRE, PHP
Test your Javascript and PCRE regular expressions online.
Regular Expression Tester online
Regular Expression Tester. ... Regular expression. Test With. Replace With (option). Case-insensitive. Global match. Web Toolkit Online works only in your ...
Pythex: a Python regular expression editor
Pythex is a real-time regular expression editor for Python, a quick way to test your regular expressions.
.NET Regex Tester - Regex Storm
Online .NET regular expression tester with real-time highlighting and detailed results output.
Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.
Test your regex by visualizing it with a live editor. JavaScript, Python, and ... Untitled Regex No description. Done Editing Embed on ... My regular expression. X.

Revision as of 22:57, 11 September 2018

A Finite State Automation (FSA) has four components: an input alphabet (those letters or strings which are legal inputs); a set of transition rules to advance from state to state (given a current state and an element of the input alphabet, what is the next state); a unique start state; and one or more final states. We can draw the FSA, as shown below, by representing each state as a circular node; the final state as a double circle; the start state as the only node with an incoming arrow; and the transition rules by the strings on the edges connecting the nodes. When labels are assigned to states, they appear inside the circle representing the state.

Fsa 1.png

If there is a path from the start state to a final state by which an input string can be parsed, then the input string is said to be “accepted” by the FSA. The FSA above will accept strings composed of one or more x’s followed by one or more y’s (e.g., xy, xxy, xxxyy, xyyy).


Just like Boolean Algebra is a convenient algebraic representation of Digital Electronics Circuits, a regular expression is an algebraic representation of an FSA. For example, the regular expression corresponding to the first FSA given above is xx*yy*. The regular expression for the second FSA is extremely complex! The following simple FSAs and REs illustrate the correspondence:

Fsas 2-3.png
ab*c ___________________ (aUb)c or acUbc

The rules for forming a regular expression (RE) are as follows:

1. The null string (λ) is a RE.
2. If the string a is in the input alphabet, then it is a RE.
3. if the strings a and b are both REs, then so are the strings built up using the following rules:
a. CONCATENATION. “ab” (a followed by b).
b. UNION. “aUb” (a or b).
c. CLOSURE. “a*” (a repeated zero or more times).

If we have a regular expression, then we can mechanically build an FSA to accept the strings which are generated by the regular expression. Conversely, if we have an FSA, we can mechanically develop a regular expression which will describe the strings which can be parsed by the FSA. For a given FSA or regular expression, there are many others which are equivalent to it. A “most simplified” regular expression or FSA is not always well defined.

Identities for regular expressions appear below. The order of precedence for regular expression operators is: Kleene Star, concatenation, and then union. The Kleene Star binds from right to left; concatenation and union both bind from left to right.

Most programming languages today have a package to handle Regular Expressions in order to do pattern matching algorithms more easily. Visual BASIC has the LIKE operator that is very easy to use. We will use these “basic” symbols in our category description.

Pattern Description
? Any single character
* Zero or more of the preceding character
# Any single digit
[ char-char,char,…] Any inclusive range or list of characters
{![charlist]} Any character not in a given range or list

For example, the following strings either match or don’t match the pattern “[A-M][o-z][!a,e,I,o,u]?#.*”:

YES NO Why not?
App$5.java Help1. E is not in the range [o-z].
Dog&2.py IoU$5 There is no . at the end.
LzyG3.txt move6.py The m is not in the range [A-M].
Hot90. MaX7A.txt The A is not a digit from 0 to 9.

Typical problems in the category will include: translate an FSA to a regular expression; simplify a regular expression (possibly to minimize the number of states or transitions); determine which regular expressions are equivalent; and determine which strings are accepted by either an FSA or a regular expression.

Basic Identities

1. (a*)* = a* 5. a(ba)* = (ab)*a
2. aa* = a*a 6. (a U b)* = (a* U b*)*
3. aa* U λ = a* 7. (a U b)* = (a*b*)*
4. a(b U c) = ab U ac 8. (a U b)* = a*(ba*)*

Sample Problems

Sample Problem 1

Find a simplified regular expression for the following FSA:

Fsa s1.png


The expression 01*01 is read directly from the FSA. It is in its most simplified form.

Sample Problem 2

Which, if any, of the following Regular Expressions are equivalent?

A. (aUb)(ab*)(b*Ua)
B. (aab*Ubab*)a
C. aab*Ubab*UaabaUbab*a
D. aab*Ubab*Uaab*aUbab*a
E. a*Ub*


On first inspection, B can be discarded because it is the only RE whose strings must end in an a. E can also be discarded since it is the only RE that can accept a null string. C and D are not equal. After expanding A, we must compare it to C and D. It is equal to D, but not to C. A and D.

Sample Problem 3

Which of the following strings match the regular expression pattern "?[A-D]*[a-d]*#" ?

A. 8AAAabd4
B. MM5
D. Abcd9
E. &dd88
F. >BAD0


The answer is A, C, D, and F. In B the character M is not in the range A-D or a-d. In E, there cannot be more than 1 digit at the end of the string. Remember that an asterisk represents 0 or more of that character or rangle of characters.

Sample Problem 4

Which of the following strings are accepted by the following regular expression "00*1*1U11*0*0" ?

A. 0000001111111
B. 1010101010
C. 1111111
D. 0110
E. 10


The answer is A and E. A uses the left hand pattern and E uses the right hand pattern. Every string must stat and end in the opposite digit so C and D don’t work. B doesn’t work because all 1’s and all 0’s must be together.

Video Resources

Nice two-part video showing the relationship between FSAs and REs.

1 - Convert Regular Expression to Finite-State Automaton (Barry Brown)

2 - Convert Regular Expression to Finite-State Automaton (Barry Brown)

This video uses the symbol "+" to mean "1 or more matches of the previous term". For example, "ab+" is the same as "abb*".