Thomas Sampson

Regular Expression Rules

Period / Full Stop

These are used in regular expressions to represent a wildcard, where a . can represent any character at all. For example

t...s

matches

trees
trams
teens

Important: The wildcard represents ONE character only, not an undefined number of characters, hence t…s would not match the word “tracks”.

Question Mark

The question mark is used to represent the presence of the previous character, or rather the meaninglessness of its presence.

For Example,

colou?r

Would match both the English and American spelling “colour” and “color”. The ? character signifies that the presence of the previous character is not necessary to form a match.

Asterisk

The asterisk character * is used to represent zero or more instances of the previous character. For example,

www\.my.*\.com

would match…

http://www.mypage.com

http://www.my.com

http://www.myspace.com

http://www.myownpersonalhomepage.com

Note: – In this example the character previous to the asterisk is a . meaning we will accept any number of wildcards. Backslashes are used simply to escape the . symbol which we need to represent as a literal string in our url.

Plus

The plus symbol + is very similar to the asterisk * symbol, however it will only match one or more instances of the previous character. Using the previous example..

www\.my.+\.com

would match..

http://www.myspace.com

http://www.mypage.com

etc

but would not match…

http://www.my.com

{N} Number

This method allows us to specify how many of the previous character we will match, For example

w{3}\.mysite\.com

will only match

http://www.mysite.com

and would not match

w.mysite.com or

ww.mysite.com

Ranges {min,max}

This method allows us to specify a minimum and maximum number of the previous characters that we will match, for example

10{1,3} years

Would match the following…

10 years

100 years

1000 years

but would not match

1 years

10000 years

100000 years

Summary

Quantifier Description
? Matches any preceding element 0 or 1 times.
* Matches the preceding element 0 or more times.
+ Matches the preceding element 1 or more times.
{num} Matches the preceding element num times.
{min, max} Matches the preceding element at least min times, but not more than max times.

taken from http://docs.activestate.com/komodo/4.4/regex-intro.html

The | Symbol

Using the | symbol allows us to use or statements in our regular expression. Here is an example

Model (R|S)1000

would match

Model R1000

Model S1000

but would not match

Model X1000

Model Z1000

Grouping with ( )

Parentheses “()” are used to group characters and expressions within larger, more complex regular expressions. Quantifiers that immediately follow the group apply to the whole group. For example:

(abc){2,3}

matches

abcabc
abcabcabc

More rules can be found at http://docs.activestate.com/komodo/4.4/regex-intro.html

Advertisements

Author: tomtech999

I have recently graduated with a 1st class degree in MComp Games Software Development at Sheffield Hallam University, focusing primarily on application development in C++, with experience in graphics programming, scripting languages, DVCS/VCS and web technology. In my spare time I enjoy Drumming, Reading and Snowboarding!

Comments are closed.