I have a couple email addresses, 'support@company.com' and '1234567@tickets.company.com'.
In perl, I could take the To: line of a raw email and find either of the above addresses with
/\w+@(tickets\.)?company\.com/i
In python, I simply wrote the above regex as '\w+@(tickets\.)?company\.com' expecting the same result. However, support@company.com isn't found at all and a findall on the second returns a list containing only 'tickets.'. So clearly the '(tickets\.)?' is the problem area, but what exactly is the difference in regular expression rules between Perl and Python that I'm missing?
-
Two problems jump out at me:
- You need to use a raw string to avoid having to escape "
\" - You need to escape "
."
So try:
r'\w+@(tickets\.)?company\.com'EDIT
Sample output:
>>> import re >>> exp = re.compile(r'\w+@(tickets\.)?company\.com') >>> bool(exp.match("s@company.com")) True >>> bool(exp.match("1234567@tickets.company.com")) Truejcoon : I second this suggestion.BipedalShark : #2 is just me being a newb at stackoverflow. Fixed the initial post. ;) - You need to use a raw string to avoid having to escape "
-
I think the problem is in your expectations of extracted values. Try using this in your current Python code:
'(\w+@(?:tickets\.)?company\.com)' -
The documentation for
re.findall:findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.Since
(tickets\.)is a group,findallreturns that instead of the whole match. If you want the whole match, put a group around the whole pattern and/or use non-grouping matches, i.e.r'(\w+@(tickets\.)?company\.com)' r'\w+@(?:tickets\.)?company\.com'Note that you'll have to pick out the first element of each tuple returned by
findallin the first case.Axeman : Okay, but interestingly, not *obvious*. -
There isn't a difference in the regexes, but there is a difference in what you are looking for. Your regex is capturing only
"tickets."if it exists in both regexes. You probably want something like this#!/usr/bin/python import re regex = re.compile("(\w+@(?:tickets\.)?company\.com)"); a = [ "foo@company.com", "foo@tickets.company.com", "foo@ticketsacompany.com", "foo@compant.org" ]; for string in a: print regex.findall(string)
0 comments:
Post a Comment