# Difference between revisions of "FSAs and Regular Expressions"

Marc Brown (Talk | contribs) m (Marc Brown moved page Regular Expressions to FSAs and Regular Expressions without leaving a redirect) |
Marc Brown (Talk | contribs) |
||

Line 1: | Line 1: | ||

− | A Finite State | + | A Finite State Automaton (FSA) is a mathematical model of computation comprising: a finite number of ''states'', of which exactly one is ''active'' at any given time; ''transition rules'' to change |

+ | the active state; an ''initial state''; and one more ''final states''. We can draw an FSA by representing each state as a circle; the final state as a double circle; the start state as the only state with an incoming arrow; and the transition rules as labeled-edges connecting the states. When labels are assigned to states, they appear inside the circle representing the state. | ||

− | + | In this category, FSAs will be limited to parsing strings. That is, determining if a string is valid or not. | |

− | + | = Basics = | |

− | + | Here is a drawing of an FSA that is used to parse strings consists of x's and y's: | |

− | + | <center> | |

+ | [[File:fsa.svg|250px]] | ||

+ | </center> | ||

− | + | In the above FSA, there are three states: A, B, and C. The initial state is A; the final state is C. The only way to go from state A to B is by ''seeing'' the letter x. Once in state B, there are two transition rules: Seeing the letter y will cause the FSA to make C the active state, and seeing an x will keep B as the active state. State C is a final state, so if the string being parsed is completed and the FSA is in State C, the input string is said to be ''accepted'' by the FSA. In State C, seeing any additional letter y will keep the machine in state C. The FSA above will accept strings composed of one or more x’s followed by one or more y’s (e.g., xy, xxy, xxxyy, xyyy, xxyyyy). | |

− | + | ||

− | + | Just like [[Boolean Algebra]] is a convenient algebraic representation of [[Digital Electronics]] Circuits, a regular expression is an algebraic representation of an FSA. For example, the regular expression corresponding to the first FSA given above is xx*yy*. | |

− | + | ||

− | + | ||

The rules for forming a regular expression (RE) are as follows: | The rules for forming a regular expression (RE) are as follows: | ||

Line 21: | Line 22: | ||

::a. CONCATENATION. “ab” (a followed by b). | ::a. CONCATENATION. “ab” (a followed by b). | ||

::b. UNION. “aUb” (a or b). | ::b. UNION. “aUb” (a or b). | ||

− | ::c. CLOSURE. “a*” (a repeated zero or more times). | + | ::c. CLOSURE. “a*” (a repeated zero or more times). This is known as the Kleene Star. |

+ | |||

+ | Identities for regular expressions appear in the next section. The order of precedence for regular expression operators is: Kleene Star, concatenation, and then union. | ||

If we have a regular expression, then we can mechanically build an FSA to accept the strings which are generated by the regular expression. Conversely, if we have an FSA, we can mechanically develop a regular expression which will describe the strings which can be parsed by the FSA. For a given FSA or regular expression, there are many others which are equivalent to it. A “most simplified” regular expression or FSA is not always well defined. | If we have a regular expression, then we can mechanically build an FSA to accept the strings which are generated by the regular expression. Conversely, if we have an FSA, we can mechanically develop a regular expression which will describe the strings which can be parsed by the FSA. For a given FSA or regular expression, there are many others which are equivalent to it. A “most simplified” regular expression or FSA is not always well defined. | ||

− | + | <!-- | |

− | + | ||

− | + | ||

Most programming languages today have a package to handle Regular Expressions in order to do pattern matching algorithms more easily. Visual BASIC has the LIKE operator that is very easy to use. We will use these “basic” symbols in our category description. | Most programming languages today have a package to handle Regular Expressions in order to do pattern matching algorithms more easily. Visual BASIC has the LIKE operator that is very easy to use. We will use these “basic” symbols in our category description. | ||

Line 74: | Line 75: | ||

|The A is not a digit from 0 to 9. | |The A is not a digit from 0 to 9. | ||

|} | |} | ||

+ | --> | ||

Typical problems in the category will include: translate an FSA to a regular expression; simplify a regular expression (possibly to minimize the number of states or transitions); determine which regular expressions are equivalent; and determine which strings are accepted by either an FSA or a regular expression. | Typical problems in the category will include: translate an FSA to a regular expression; simplify a regular expression (possibly to minimize the number of states or transitions); determine which regular expressions are equivalent; and determine which strings are accepted by either an FSA or a regular expression. | ||

− | = | + | = Regular Expression Identities = |

{| Class="wikitable" | {| Class="wikitable" | ||

|- | |- | ||

|1. (a*)* = a* | |1. (a*)* = a* | ||

− | |||

|- | |- | ||

|2. aa* = a*a | |2. aa* = a*a | ||

− | |||

|- | |- | ||

|3. aa* U λ = a* | |3. aa* U λ = a* | ||

− | |||

|- | |- | ||

|4. a(b U c) = ab U ac | |4. a(b U c) = ab U ac | ||

+ | |- | ||

+ | |5. a(ba)* = (ab)*a | ||

+ | |- | ||

+ | |6. (a U b)* = (a* U b*)* | ||

+ | |- | ||

+ | |7. (a U b)* = (a*b*)* | ||

+ | |- | ||

|8. (a U b)* = a*(ba*)* | |8. (a U b)* = a*(ba*)* | ||

|} | |} | ||

Line 96: | Line 102: | ||

= Sample Problems = | = Sample Problems = | ||

− | == | + | == Problem 1 == |

Find a simplified regular expression for the following FSA: | Find a simplified regular expression for the following FSA: | ||

Line 106: | Line 112: | ||

The expression 01*01 is read directly from the FSA. It is in its most simplified form. | The expression 01*01 is read directly from the FSA. It is in its most simplified form. | ||

− | == | + | == Problem 2 == |

+ | |||

+ | Which of the following strings are accepted by the following regular expression "00*1*1U11*0*0" ? | ||

+ | |||

+ | ::A. 0000001111111 | ||

+ | ::B. 1010101010 | ||

+ | ::C. 1111111 | ||

+ | ::D. 0110 | ||

+ | ::E. 10 | ||

+ | |||

+ | '''Solution:''' | ||

+ | |||

+ | Choice A uses the left hand pattern (00*1*1) and choice E uses the right hand pattern (11*0*0). | ||

+ | Every string must start and end in the opposite digit so choices C and D don’t work. Choice B doesn’t work because all 1s and all 0s must be together. | ||

+ | The correct answer is A and E. | ||

+ | |||

+ | == Problem 3 == | ||

Which, if any, of the following Regular Expressions are equivalent? | Which, if any, of the following Regular Expressions are equivalent? | ||

Line 117: | Line 139: | ||

'''Solution:''' | '''Solution:''' | ||

− | + | Choice B can be discarded because it is the only RE whose strings '''must''' end with an a. Choice E can be discarded since it is the only RE that can accept a null string. Choices C and D are not equal. After expanding choice A, we must compare it to choices C and D. It is equal to choice D, but not to choice C. The only REs that are equivalent are choices A and D. | |

− | == | + | <!-- |

+ | == Problem 3 == | ||

Which of the following strings match the regular expression pattern "?[A-D]*[a-d]*#" ? | Which of the following strings match the regular expression pattern "?[A-D]*[a-d]*#" ? | ||

Line 133: | Line 156: | ||

The answer is A, C, D, and F. In B the character M is not in the range A-D or a-d. In E, there cannot be more than 1 digit at the end of the string. Remember that an asterisk represents 0 or more of that character or rangle of characters. | The answer is A, C, D, and F. In B the character M is not in the range A-D or a-d. In E, there cannot be more than 1 digit at the end of the string. Remember that an asterisk represents 0 or more of that character or rangle of characters. | ||

− | + | --> | |

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

= Video Resources = | = Video Resources = | ||

Line 157: | Line 167: | ||

| [https://youtu.be/GwsU2LPs85U ''1 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')] | | [https://youtu.be/GwsU2LPs85U ''1 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')] | ||

− | + | |- | |

| <youtube width="300" height="180">https://youtu.be/shN_kHBFOUE</youtube> | | <youtube width="300" height="180">https://youtu.be/shN_kHBFOUE</youtube> | ||

| [https://youtu.be/shN_kHBFOUE ''2 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')] | | [https://youtu.be/shN_kHBFOUE ''2 - Convert Regular Expression to Finite-State Automaton'' ('''Barry Brown''')] |

## Revision as of 23:29, 11 September 2018

A Finite State Automaton (FSA) is a mathematical model of computation comprising: a finite number of *states*, of which exactly one is *active* at any given time; *transition rules* to change
the active state; an *initial state*; and one more *final states*. We can draw an FSA by representing each state as a circle; the final state as a double circle; the start state as the only state with an incoming arrow; and the transition rules as labeled-edges connecting the states. When labels are assigned to states, they appear inside the circle representing the state.

In this category, FSAs will be limited to parsing strings. That is, determining if a string is valid or not.

## Contents

# Basics

Here is a drawing of an FSA that is used to parse strings consists of x's and y's:

In the above FSA, there are three states: A, B, and C. The initial state is A; the final state is C. The only way to go from state A to B is by *seeing* the letter x. Once in state B, there are two transition rules: Seeing the letter y will cause the FSA to make C the active state, and seeing an x will keep B as the active state. State C is a final state, so if the string being parsed is completed and the FSA is in State C, the input string is said to be *accepted* by the FSA. In State C, seeing any additional letter y will keep the machine in state C. The FSA above will accept strings composed of one or more x’s followed by one or more y’s (e.g., xy, xxy, xxxyy, xyyy, xxyyyy).

Just like Boolean Algebra is a convenient algebraic representation of Digital Electronics Circuits, a regular expression is an algebraic representation of an FSA. For example, the regular expression corresponding to the first FSA given above is xx*yy*.

The rules for forming a regular expression (RE) are as follows:

- 1. The null string (λ) is a RE.
- 2. If the string a is in the input alphabet, then it is a RE.
- 3. if the strings a and b are both REs, then so are the strings built up using the following rules:
- a. CONCATENATION. “ab” (a followed by b).
- b. UNION. “aUb” (a or b).
- c. CLOSURE. “a*” (a repeated zero or more times). This is known as the Kleene Star.

Identities for regular expressions appear in the next section. The order of precedence for regular expression operators is: Kleene Star, concatenation, and then union.

If we have a regular expression, then we can mechanically build an FSA to accept the strings which are generated by the regular expression. Conversely, if we have an FSA, we can mechanically develop a regular expression which will describe the strings which can be parsed by the FSA. For a given FSA or regular expression, there are many others which are equivalent to it. A “most simplified” regular expression or FSA is not always well defined.

Typical problems in the category will include: translate an FSA to a regular expression; simplify a regular expression (possibly to minimize the number of states or transitions); determine which regular expressions are equivalent; and determine which strings are accepted by either an FSA or a regular expression.

# Regular Expression Identities

1. (a*)* = a* |

2. aa* = a*a |

3. aa* U λ = a* |

4. a(b U c) = ab U ac |

5. a(ba)* = (ab)*a |

6. (a U b)* = (a* U b*)* |

7. (a U b)* = (a*b*)* |

8. (a U b)* = a*(ba*)* |

# Sample Problems

## Problem 1

Find a simplified regular expression for the following FSA:

**Solution:**

The expression 01*01 is read directly from the FSA. It is in its most simplified form.

## Problem 2

Which of the following strings are accepted by the following regular expression "00*1*1U11*0*0" ?

- A. 0000001111111
- B. 1010101010
- C. 1111111
- D. 0110
- E. 10

**Solution:**

Choice A uses the left hand pattern (00*1*1) and choice E uses the right hand pattern (11*0*0). Every string must start and end in the opposite digit so choices C and D don’t work. Choice B doesn’t work because all 1s and all 0s must be together. The correct answer is A and E.

## Problem 3

Which, if any, of the following Regular Expressions are equivalent?

- A. (aUb)(ab*)(b*Ua)
- B. (aab*Ubab*)a
- C. aab*Ubab*UaabaUbab*a
- D. aab*Ubab*Uaab*aUbab*a
- E. a*Ub*

**Solution:**

Choice B can be discarded because it is the only RE whose strings **must** end with an a. Choice E can be discarded since it is the only RE that can accept a null string. Choices C and D are not equal. After expanding choice A, we must compare it to choices C and D. It is equal to choice D, but not to choice C. The only REs that are equivalent are choices A and D.

# Video Resources

Nice two-part video showing the relationship between FSAs and REs.

1 - Convert Regular Expression to Finite-State Automaton (Barry Brown)
| |

2 - Convert Regular Expression to Finite-State Automaton (Barry Brown)
This video uses the symbol "+" to mean "1 or more matches of the previous term". For example, "ab+" is the same as "abb*". |