Advanced Regular Expressions

From ACSL Category Descriptions
Revision as of 10:19, 1 September 2020 by Mariana (talk | contribs) (Created page with "== More useful patterns == {| class="wikitable" style="text-align: left"| |- ! Pattern !! Description !! REGEX !! Sample match !! Sample not match |- | \d || '''Di...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

More useful patterns

Pattern Description REGEX Sample match Sample

not match

\d Digit.

Matches any digit. Equivalent with [0-9].

\d\d\d 123 1-3
\D Non digit.

Matches any character that is not a digit.

\d\D\d 1-3 123
\w Word.

Matches any alphanumeric character and underscore. Equivalent with [a-zA-Z0-9_].

\w\w\w a_A a-A
\W Not Word.

Matches any character that is not word character (alphanumeric character and underscore).

\W\W\W +-$ +_@
\s Whitespace.

Matches any whitespace character (space, tab, line breaks).

\d\s\w 1 a 1ab
\S Not Whitespace.

Matches any character that is not a whitespace character (space, tab, line breaks).

\w\w\w\w\S\d Test#1 test 1
\b Word boundaries.

Can be used to match a complete word. Word boundaries are the boundaries between a word

and a non-word character.

\bis\b is; This

island:

{} The curly braces {…}.

It tells the computer to repeat the preceding character (or set of characters) for

as many times as the value inside this bracket.

{min,} means the preceding character is matches min times or more.

{min,max} means that the preceding character is repeated at least min and at most max times.

abc{2}

abcc

abc

.* Matches any character (except for line terminators), matches between zero and unlimited times. .*

abbb

Empty string

.+ Matches any character (except for line terminators), matches between one and unlimited times. .+ a

abbcc

Empty string
^ Anchor ^.The start of the line.

Matches position just before the first character of the string.

^The\s\w+ The contest One contest
$ Anchor $. The end of the line.

Matches position just after the last character of the string.

\d{4}\sACSL$ 2020 ACSL 2020 STAR
\ Escape a special character.

If you want to use any of the metacharacters as a literal in a regex, you need to escape them with a backslash, like: \. \* \+ \[ etc.

\w\w\w\. cat. lion
() Groups.

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characthers and capturing them using the parentheses ().

^(file.+)\.docx$ file_graphs.docx

file_lisp.docx

data.docx
\number Backreference.

A set of different symbols of a regular expression can be grouped together to act as a single unit and behave as a block.

\n means that the group enclosed within the n-th bracket will be repeated at current position.

\1 Contents of Group 1. r(\w)g\1x regex

Group \1 is e

regxx
\2 Contents of Group 2. (\d\d)\+(\d\d)=\2\+\1 20+21=21+20

Group \1 is 20

Group \2 is 21

20+21=20+21

Sample Problems