Please note: This syntax for Regex is for JavaScript, and the syntax for other languages may differ slightly!
Regular Expressions, or Regex for short, can be used to extract lists of simple or complex combinations of characters in a larger text.
From the text "otnhu,ch230 onte 2389]" we can, for example, get all digits, all words or the amount a single character is repeated (with some wrapper functionality in say C# or Java SE)
A very good tool for testing your Regex expressions is regexpal. Just paste your expression and test-data in the two text-boxes.
Take a look on regexpal and look to the right, on Quick Reference.

It's a lot of info, so let us start simple.
Character classes
\d matches all digits. This equals to one character of the numbers from 0 to 9.
The result of '\d' can thus be substituted with '2' or '4' or '7' etc.
\d is called a character class.
Regex: \d
Matches: hello 2 to34
Regex: \d\d
Matches: hello 2 to34
'\d' will give you list containing the three items '2','3' and '4'.
'\d\d' will give you list containing one item '34'.
Elements
All single characters, or substitutes such as \d, are elements.
'2', 'a', 'B', '\d' are elements.
'22', 'AQ', 'TTT' '\d\d' are not elements.
Elements can also be constructed from many characters, using '(' and ')', for example (AB) is an element.
Quantifiers
Elements can have quantifiers, written after the element, which modifies it.
+
The '+' quantifier modifies the preceeding element to match one or more of itself.
So '+' after '\d' makes the '\d' repeat one or more times, until it reaches a non-digit.
Regex: \d+
Matches: my cute 235opossum is called 7777
As you see, the \d begins at '2' and the + quantifier makes \d repeat until '5'. Then it traverses the rest of the characters until '7' and ends at the last '7'.
Thus we get a list of two items containing '235' and '7777'.
So, as \d reaches 235 it becomes repeated three times '\d\d\d'. And upon 7777 it becomes repeated four times '\d\d\d\d'.
You can test this manually:
Regex: \d\d
Matches: my cute 235opossum is called 7777
Regex: \d\d\d
Matches: my cute 235opossum is called 7777
Remember that quantifiers can be attached to any element. '2' or 'A' is an element, as it's a single character.
So we can do the following:
Regex: 2
22 ham sandwiches 44 2
Regex: 2+
22 ham sandwiches 44 2 2222
There is a character between '2' and '2222', the whitespace ' ', so the regex sequence stops in between.
{ }
If you want to only match a set number of the same element, use '{' and '}'.
BBB BBBB BBA BBAAABBBAABB AABBBBBBBAAA
This matches four B in succession.
BBB BBBB BBA BBAAABBBAABB AABBBBBBBAAA
This matches between 1 and 3 B in succession.
Closing Comments
Now the power of Regex is shown with the combination of all these elements and modifiers.
Task: find two letters followed by two to three numbers
Regex: \w{2}\d{2,3}
oethu2390uN S<<o eeu9 34.R<>:Et<Eth;,go ogle.com23{H

the example for Regex: B{1,3} is wrong:
ReplyDeletePython code:
m = re.findall('B{1,3}','BBB BBBB BBA BBAAABBBAABB AABBBBBBBAAA')
for n in m:
print n
Otherwise nice article :)
JavaScript and Python has some differences in syntax I think for regex
DeleteWow…this post tingled my toes. I’m so full of admiration and respect for firefighters in general and now you in particular Stewart. Thank you for sharing.
ReplyDeleteCheap Essay Writing Services
Term Paper Writing
Accounts Software For Small Business