Python下类Shell通配符匹配字符串

in 互联网技术 with 0 comment  访问: 3,248 次

如果你想Python下跟Shell下一样,使用通配符来做字符串的匹配,例如: *.py, nginx-access-2018060[0-9]*.log等。

在Python下可以利用fnmatch提供的两个函数fnmatch()fnmatchcase()来实现这种类Shell下通配符匹配的情况,源码分别如下:

fnmatch

def fnmatch(name, pat):
    """Test whether FILENAME matches PATTERN.

    Patterns are Unix shell style:

    *       matches everything
    ?       matches any single character
    [seq]   matches any character in seq
    [!seq]  matches any char not in seq

    An initial period in FILENAME is not special.
    Both FILENAME and PATTERN are first case-normalized
    if the operating system requires it.
    If you don't want this, use fnmatchcase(FILENAME, PATTERN).
    """
    name = os.path.normcase(name)
    pat = os.path.normcase(pat)
    return fnmatchcase(name, pat)

fnmatchcase

@functools.lru_cache(maxsize=256, typed=True)
def _compile_pattern(pat):
    if isinstance(pat, bytes):
        pat_str = str(pat, 'ISO-8859-1')
        res_str = translate(pat_str)
        res = bytes(res_str, 'ISO-8859-1')
    else:
        res = translate(pat)
    return re.compile(res).match

def fnmatchcase(name, pat):
    """Test whether FILENAME matches PATTERN, including case.

    This is a version of fnmatch() which doesn't case-normalize
    its arguments.
    """
    match = _compile_pattern(pat)
    return match(name) is not None

简单示例如下:

>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch('hello.py', '*.py')
True
>>> fnmatch('hello.py', '?ello.py')
True
>>> fnmatch('nginx-access-20180609.log', 'nginx-access-2018060[0-9]*')
True
>>> fnmatch('nginx-access-20180609.log', 'nginx-access-2018060[0-9].log')
True
>>> file_names = ['nginx-access-20180620.log', 'hello.py', 'config.ini', 'sendData.py']
>>> [name for name in file_names if fnmatch(name, '*.py')]
['hello.py', 'sendData.py']

但是有一个问题fnmatch()函数在不同底层操作系统下使用的时候对大小写敏感, 不同的系统表现情况不同,如下所示:

>>> # On OS X (Mac)
>>> fnmatch('test.txt', '*.TXT')
False
>>> # On Windows
>>> fnmatch('test.txt', '*.TXT')
True
>>>

如果你对这个区别很在意,可以使用fnmatchcase()来代替, 它会严格按照大小写来匹配,如下示例:

>>> fnmatchcase('test.txt', '*.TXT')
False
>>> fnmatchcase('test.txt', '*.txt')
True

这两个函数通常还有一个会被忽略的一个特性是在处理非文件名的字符串时候它们也是很有用的。 比如,假设你有一个街道地址的列表数据:

addresses = [
    '5412 N CLARK ST',
    '1060 W ADDISON ST',
    '1039 W GRANVILLE AVE',
    '2122 N CLARK ST',
    '4802 N BROADWAY',
]

你可以如下这样写列表推导式:

>>> from fnmatch import fnmatchcase
>>> [addr for addr in addresses if fnmatchcase(addr, '* ST')]
['5412 N CLARK ST', '1060 W ADDISON ST', '2122 N CLARK ST']
>>> [addr for addr in addresses if fnmatchcase(addr, '54[0-9][0-9] *CLARK*')]
['5412 N CLARK ST']

fnmatch()函数匹配能力介于简单的字符串方法和强大的正则表达式之间。 如果在数据处理操作中只需要简单的通配符就能完成的时候, 使用它是一个很好的选择。

如果你的代码需要做文件名的匹配,最好使用glob模块, 简单示例如下:

[root@nock opt]# pwd
/opt
[root@nock opt]# ls
file1.py  file2.py  file3.py  file4.py
[root@nock opt]# python
Python 3.5.1 (default, Nov 20 2015, 02:00:19) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob
>>> pyfiles = glob.glob('/opt/*.py')
>>> pyfiles
['/opt/file1.py', '/opt/file2.py', '/opt/file3.py', '/opt/file4.py']

如果使用fnmatch()函数,操作如下:

>>> import os
>>> from fnmatch import fnmatch
>>> pyfiles = [name for name in os.listdir('/opt/') if fnmatch(name, '*.py')]
>>> pyfiles
['file1.py', 'file2.py', 'file3.py', 'file4.py']

如上示例我们可以看出来效果是一样的,但是你会发现使用glob模块会更简单明了。

WeZan