A handy list of Python string methods
Tag Archives: string manipulation
Regular Expression Rules
Period / Full Stop
These are used in regular expressions to represent a wildcard, where a . can represent any character at all. For example
t...s
matches
trees trams teens
Important: The wildcard represents ONE character only, not an undefined number of characters, hence t…s would not match the word “tracks”.
Question Mark
The question mark is used to represent the presence of the previous character, or rather the meaninglessness of its presence.
For Example,
colou?r
Would match both the English and American spelling “colour” and “color”. The ? character signifies that the presence of the previous character is not necessary to form a match.
Asterisk
The asterisk character * is used to represent zero or more instances of the previous character. For example,
www\.my.*\.com
would match…
www.mypage.com
www.my.com
www.myspace.com
www.myownpersonalhomepage.com
Note: – In this example the character previous to the asterisk is a . meaning we will accept any number of wildcards. Backslashes are used simply to escape the . symbol which we need to represent as a literal string in our url.
Plus
The plus symbol + is very similar to the asterisk * symbol, however it will only match one or more instances of the previous character. Using the previous example..
www\.my.+\.com
would match..
www.myspace.com
www.mypage.com
etc
but would not match…
www.my.com
{N} Number
This method allows us to specify how many of the previous character we will match, For example
w{3}\.mysite\.com
will only match
www.mysite.com
and would not match
w.mysite.com or
ww.mysite.com
Ranges {min,max}
This method allows us to specify a minimum and maximum number of the previous characters that we will match, for example
10{1,3} years
Would match the following…
10 years
100 years
1000 years
but would not match
1 years
10000 years
100000 years
Summary
| Quantifier | Description |
| ? | Matches any preceding element 0 or 1 times. |
| * | Matches the preceding element 0 or more times. |
| + | Matches the preceding element 1 or more times. |
| {num} | Matches the preceding element num times. |
| {min, max} | Matches the preceding element at least min times, but not more than max times. |
taken from http://docs.activestate.com/komodo/4.4/regex-intro.html
The | Symbol
Using the | symbol allows us to use or statements in our regular expression. Here is an example
Model (R|S)1000
would match
Model R1000
Model S1000
but would not match
Model X1000
Model Z1000
Grouping with ( )
Parentheses “()” are used to group characters and expressions within larger, more complex regular expressions. Quantifiers that immediately follow the group apply to the whole group. For example:
(abc){2,3}
matches
abcabc
abcabcabc
More rules can be found at http://docs.activestate.com/komodo/4.4/regex-intro.html
Regular Expressins in Python
Really usefull link to help getting to grips with regular expressions. Although the code examples around regular expressions are written in Python, don’t be put off if you are not familiar with the language. The article provides an overview of regular expressions and how to use them, giving you the understanding to implement them in your language of choice.

