Friday, July 15, 2005

A regular expression help from C# newsgroup

: I am trying to parse out the apratment number in a regular expression :
:
: If I use
:
: Regex regex = new
: Regex(@"[...](?\bAPT|#|UNIT\b)[...]",
: RegexOptions.ExplicitCapture);
:
: I will be able to parse out "100 main ST # C)
:
: But if I move "#" position in regular expression, I won't be able to
: parse out the same address anymore.
:
: Regex regex = new
: Regex(@"[...](?\bAPT|UNIT|#\b)[...]",
: RegexOptions.ExplicitCapture);

The latter fails to match because there's no word boundary (\b) between
an octothorpe and a space. Remember, a word boundard occurs between a
\w and a \W or vice versa, but '#' and ' ' both match \W.

A start in the right direction is (line breaks inserted):

(?.+)
(?\b(APT|UNIT)\b|#(?=\s+))
(?.+)

Hope this helps,
Greg

No comments: