Wednesday, August 10, 2011

Grouping in Python Regular Expression

In Python RE module, the most useful trick is grouping. It enables us to dissect strings by writing a RE divided into several subgroups which match different components of interest. For example, to match a common pattern email: author@example.com and extract the field name (email) and field value (author@example.com), following codes can be used:
header = "email: author@example.com"
field_name = re.match("(\S+):\s+(\S+)", header).group(1)
field_value = re.match("(\S+):\s+(\S+)", header).group(2)
Sometime we don't want to retrieve the group content, we can use ?:... where ... should be replaced with any RE pattern:
>>> header = "email: author@example.com"
>>> re.match("(?:\S+):\s+(\S+)", header).groups()
... ('author@example.com',)
If we want to retrieve the group content by name instead of group index, just use ?P... where name can be replaced with a convenient group name and ... should again be replaced with any RE pattern:
>>> header = "email: author@example.com"
>>> re.match("(?P\S+):\s+(\S+)", header).group('field_name')
... 'email'

No comments:

Post a Comment