Regular Expression is a technique used to specify search patterns for string-searching algorithms developed by American mathematician Stephen Cole Kleene in theoretical computer science.
Finite state automata is an equivalent machine for a regular expression.
Applications and Uses:
- Regular Expressions are used in search engines, search and replace dialogs of word processors and text editors, and lexical analysis in a compiler, etc.
- Regular Expressions are used in just-in-time compilation for speed.
- The application includes validation, web scraping, data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.
- In PostgreSQL, regular expressions are used for the implementation of transmission control language.
Regular Expression basics:
Many programming languages such as C, C++, Java, javascript, and python provide regular expression capabilities either built-in or via libraries, as it has uses in many situations.
Python programming language also has an inbuilt “re” module which provides all the searching capabilities to programmers.
Example:
import re
pattern = '^G............h$'
string_1 = 'Gurpreet Singh'
result_1 = re.match(pattern, string_1)
print(result_1)
string_2 = 'Gurpreet'
result_2 = re.match(pattern, string_2)
print(result_2)
Output:
<re.Match object; span=(0, 14), match=’Gurpreet Singh’>
None
In the above example we are looking for a pattern: any fourteen-letter string starting with “G” and ending with “h”.
Special Characters:
Character | Description |
---|---|
^ | At the beginning of a string, it matches the expression to its right. It looks for any such instance before each \n in the string. |
$ | At the end of a string, it matches the expression to its left. It looks for any such instance before each \n in the string. |
. | Except for line terminators like \n, it matches any character. |
\ | It is used to escape or signal special characters. |
A|B | A and B are two expressions. Match either A or B. If A is matched, then B is not tested. |
+ | It matches the expression to its left 1 or more times. |
* | It matches the expression to its left 0 or more times. |
? | It matches 0 or 1 repetitions of expression to its left. If we add ? after +,? and * qualifiers then they will become non-greedy. |
{m} | It matches ‘m’ repetitions of expression to its left. If there are less than ‘m’ repetitions then the whole expression does not match. |
{m,n} | It matches the ‘m’ to ‘n’ repetitions of expression to its left. |
{m,n}? | It matches ‘m’ repetitions of expression to its left and ignores ‘n’. It becomes non-greedy because of the ‘?’ qualifier. |
Character Classes
Character Classes | Description |
---|---|
\w | It matches any word character containing a to z, A to Z, 0 to 9, or underscore. |
\d | It matches any string containing digits 0 to 9. |
\D | It matches any non-digits string. |
\s | It matches white space characters string including space characters, \t, \r, and \n. |
\S | It matches any non-whitespace character string. |
\b | It matches the string at the start or end of a word. |
\B | It matches the string that is not at the start or end of a word. |
\A | It matches the string only at the start of a word. |
\Z | It matches the string only at the end of a word. |
Sets
Set | Description |
---|---|
[] | It represents the set of characters to match. |
[qwe] | It matches listed characters either q, w, or e. It does not match ‘qwe’ as a whole. |
[a-z] | It matches any lowercase alphabet letter from a to z. |
[f\-m] | It matches either f, – or m. |
[0-] | It matches either 0 or -. |
[-9] | It matches either – or 9. |
[a-z0-9] | It matches any lowercase alphabet letter from a to z and also from 0 to 9. |
[(+*)] | It also matches special characters like other characters. It matches either (,+,* or ). |
[^yz9] | ^ used to exclude any character in the set. It matches all characters other than y,z, and 9. |
Groups
Group | Description |
---|---|
() | It represents a group and the expression inside it will match. |
(?) | (?) | Inside parentheses like this, ? acts as an extension notation. Its meaning depends on the character immediately to its right. |
(?PAB) | (?PAB) | Matches the expression AB, and it can be accessed with the group name. |
(?aiLmsux) | Here, a, i, L, m, s, u, and x are flags: a — Matches ASCII only i — Ignore case L — Locale dependent m — Multi-line s — Matches all u — Matches Unicode x — Verbose |
(?:A) | Matches the expression as represented by A, but unlike (?PAB), it cannot be retrieved afterward. |
(?#…) | To specify a comment. |
A(?=B) | Lookahead assertion. This matches the expression A only if it is followed by B. |
A(?!B) | Negative lookahead assertion. This matches the expression A only if it is not followed by B. |
(?<=B)A | Positive look-behind assertion. This matches the expression A only if B is immediate to its left. This can only match fixed-length expressions. |
(?<!B)A | Negative look-behind assertion. This matches the expression A only if B is not immediately to its left. This can only match fixed-length expressions. |
(?P=name) | Matches the expression matched by an earlier group named “name”. |
(…)\1 | The number 1 corresponds to the first group to be matched. If we want to match more instances of the same expression, simply use its number instead of writing out the whole expression again. We can use from 1 up to 99 such groups and their corresponding numbers. |
Python “re” module functions:
Function | Description |
---|---|
re.findall(A, B) | Return list of all instances where expression A matches in string B. |
re.search(A, B) | Return re-match object of the first instance where expression A matches in string B. |
re.split(A, B) | Return a list of strings that are split a string B using delimiter A. |
re.sub(X, Y, S) | Replace X with Y in the string S. |
People having good knowledge of Financial accounting can get an Accounting Certification from StudySection to increase their chances of getting a job in this field. You can get a foundation level certification if you are new to Financial accounting or you can go for advanced level certification if you have expert level skills in Financial accounting.