Archive for May 2008

Regular expressions

Regular expressions are widely used and provide a filter criterion, in which the expression in the form of a pattern is matched.
That’s the theory …
In practice, the regular expression often used, for example, to filter strings or to create a rewrite rules for Apache. To understand the regular expressions, you must lern the EBNF. In this topic I would like to explain the basics, so that you could understand how the regular expressions work:

| - Pipe symbol stands for logical “or”.
() - Round brackets indicate a grouping.
e.g. (a | b) stay for „a or b“.
[] - The square brackets define a range of characters that can occur. For example, [0-6] means that there is a number from 0 to 6 can occur.
[a-z] would mean that there is a small letter of the alphabet can occur.
You can also combine: [a-zA-Z0-9] would mean that any Latin letter or any number can occur.
[^ f] - A ^-symbol before a character means an exception, it could occur any symbol expect f.
. - Point stands for „any character“. (Note, if you would like to match point self, you should mask it with a backslash “\.”).
? - The term with question mark is optional.
Example: (aaa) (abc)? All of the strings with the phrase “aaaabc”, but also just “aaa” will be matched.
+ - The expression occurs at least once, but it can also occur many times.
Example: (aaa) + - This allows strings “aaa”, but also “aaaaaa” or “aaaaaaaaa” etc.
* - This expression can occur many times, but it is not required.
For example [a-z] *
{min, max} – This rule define how often the expression may occur.
For example, [0-9] (1.2) would mean that a number from 0 to 9 could occur at least 1 times and not more than 2 times.

If you have problems and do not know where the your mistake, I recommend the program The Regex Coach, so you can operate wonderfully debugging and error.