Monday, February 21, 2011

Regular expression to match start of filename and filename extension

What is the regex to match filenames that start with 'Run' and have a filename extension of '.py'?

It should match any of the following:

RunFoo.py
RunBar.py
Run42.py

It should not match:

myRunFoo.py
RunBar.py1
Run42.txt

(I am a regex novice and am more familiar with sql wildcards so the equivalent of what I am searching for would be "..LIKE 'Run%.py'...")

From stackoverflow
  • This probably doesn't fully comply with file-naming standards, but here it goes:

    /^Run[\w]*?\.py$/
    
    Zing- : looks like a Perl solution for a question tagged python... but I am not a python expert :p and as jobscry pointed out your solution is case-sensitive.
    Zing- : *err has Rob Howard pointed out that is
    Brian : Shouldn't you use .*, rather than \w - punctuation and whitespace etc should probably still be considered part of the filename. eg "Run.foo.py"
  • /^Run.*\.py$/
    

    Or, in python specifically:

    import re
    re.match(r"^Run.*\.py$", stringtocheck)
    

    This will match "Runfoobar.py", but not "runfoobar.PY". To make it case insensitive, instead use:

    re.match(r"^Run.*\.py$", stringtocheck, re.I)
    
  • mabye:

    ^Run.*\.py$
    

    just a quick try

    Brian : You need .*, rather than .? (which will only match a single character)
    jobscry : doh, thanks Brian
  • For a regular expression, you would use:

    re.match(r'Run.*\.py$')
    

    A quick explanation:

    • . means match any character.
    • * means match any repetition of the previous character (hence .* means any sequence of chars)
    • \ is an escape to escape the explicit dot
    • $ indicates "end of the string", so we don't match "Run_foo.py.txt"

    However, for this task, you're probably better off using simple string methods. ie.

    filename.startswith("Run") and filename.endswith(".py")
    

    Note: if you want case insensitivity (ie. matching "run.PY" as well as "Run.py", use the re.I option to the regular expression, or convert to a specific case (eg filename.lower()) before using string methods.

    Zing- : 1. you don't have to specify start of line for python regular expression match? 2. * is zero or more match (i.e. so Run.py would be acceptable)
    Zing- : Also, how would you make it case-insensitive?
    Brian : re.match already specifies the start of the string (as opposed to re.search, which doesn't). "Run.py" *should* match, given the definition (It starts with Run, and has a .py extension). For case insensitivity, see the note at the end.
  • Warning:

    • jobscry's answer ("^Run.?.py$") is incorrect (will not match "Run123.py", for example).
    • orlandu63's answer ("/^Run[\w]*?.py$/") will not match "RunFoo.Bar.py".

    (I don't have enough reputation to comment, sorry.)

    Ates Goral : We'll get you those rep points :)
    Rob Howard : Cripes, that was quick. Thanks. :-)
  • If you write a slightly more complex regular expression, you can get an extra feature: extract the bit between "Run" and ".py":

    >>> import re
    >>> regex = '^Run(?P<name>.*)\.py$'
    >>> m = re.match(regex, 'RunFoo.py')
    >>> m.group('name')
    'Foo'
    

    (the extra bit is the parentheses and everything between them, except for '.*' which is as in Rob Howard's answer)

  • I don't really understand why you're after a regular expression to solve this 'problem'. You're just after a way to find all .py files that start with 'Run'. So this is a simple solution that will work, without resorting to compiling an running a regular expression:

    import os
    for filename in os.listdir(dirname):
        root, ext = os.path.splitext(filename)
        if root.startswith('Run') and ext == '.py':
            print filename
    
  • You don't need a regular expression, you can use glob, which takes wildcards e.g. Run*.py

    For example, to get those files in your current directory...

    import os, glob
    files = glob.glob( "".join([ os.getcwd(), "\\Run*.py"]) )
    

0 comments:

Post a Comment