Ignore case in glob() on Linux

Question

I'm writing a script which will have to work on directories which are modified by hand by Windows and Linux users alike. The Windows users tend to not care at all about case in assigning filenames.

Is there a way to handle this on the Linux side in Python, i.e. can I get a case-insensitive, glob-like behaviour?

Geoffrey Irving · Accepted Answer · 2018-05-03 15:00:05Z

65

You can replace each alphabetic character c with [cC], via

import glob
def insensitive_glob(pattern):
    def either(c):
        return '[%s%s]' % (c.lower(), c.upper()) if c.isalpha() else c
    return glob.glob(''.join(map(either, pattern)))

edited May 3, 2018 at 15:00

answered Jun 4, 2012 at 18:57

Geoffrey Irving

6,6134 gold badges34 silver badges43 bronze badges

3

To be a little more pythonic...or at least make pylint happy return glob.glob(''.join(either(char) for char in pattern))
– shao.lo
Commented Apr 26, 2015 at 1:55
10

shao.lo: Yes, that does have the advantage of being longer.
– Geoffrey Irving
Commented Apr 26, 2015 at 3:55
2

This solution has serious drawbacks, so be careful. First, glob() will fail using this kind of pattern on Windows drive letter. Then, the same apply when using "magic" folders such as "sysnative" folder.
– payet_s
Commented Jun 22, 2016 at 11:00
1

@GeoffreyIrving Ha, and slower...(pattern of size 1k shown) python In[10] %timeit ''.join(either(char) for char in pattern) 392 µs ± 5.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In[11] %timeit ''.join(map(either, pattern)) 358 µs ± 7.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
– craymichael
Commented Dec 18, 2020 at 22:28
2

@craymichael it looks like you can shave another 10% off the runtime in Python 3.6+ by using f'[{c.lower()}{c.upper()}]' instead of the % operator to create the string. I also find that pretty legible with appropriate syntax highlighting!
– Milosz
Commented Aug 30, 2022 at 6:00

| Show 1 more comment

Fred Foo · Accepted Answer · 2011-11-16 12:09:37Z

37

Use case-insensitive regexes instead of glob patterns. fnmatch.translate generates a regex from a glob pattern, so

re.compile(fnmatch.translate(pattern), re.IGNORECASE)

gives you a case-insensitive version of a glob pattern as a compiled RE.

Keep in mind that, if the filesystem is hosted by a Linux box on a Unix-like filesystem, users will be able to create files foo, Foo and FOO in the same directory.

edited Nov 16, 2011 at 12:09

answered Nov 16, 2011 at 12:04

Fred Foo

363k78 gold badges758 silver badges846 bronze badges

1

cool 8) is there also a function to then return the list of matching filenames, or do i manually have to go through cascades of os.listdir()?
– andreas-h
Commented Nov 16, 2011 at 13:14
3

after 2 hours of fiddling around with os.walk, i'm at at a loss. could you please advice a bit more? i'm having a hard time figuring out the looping around the dirs, matching the re and breaking appropriately. probably not my day :(
– andreas-h
Commented Nov 16, 2011 at 15:44
@andreash: os.walk returns triples (basepath, dirs, files) s.t. you can get the relative path to a dir or file by joining it (os.path.join) with basepath. You can then try to match the result with your pattern.
– Fred Foo
Commented Nov 16, 2011 at 15:51
2

i'll accept this answer, as it gives a valid response. however, i decided to use a more tailor-made combination of os.walk and os.listdir, for speed reasons.
– andreas-h
Commented Jan 18, 2012 at 12:50

Add a comment |

Sylvester is on codidact.com · Accepted Answer · 2022-06-22 19:40:24Z

Non recursively

In order to retrieve the files (and files only) of a directory "path", with "globexpression":

list_path = [i for i in os.listdir(path) if os.path.isfile(os.path.join(path, i))]
result = [os.path.join(path, j) for j in list_path if re.match(fnmatch.translate(globexpression), j, re.IGNORECASE)]

Recursively

with walk:

result = []
for root, dirs, files in os.walk(path, topdown=True):
    result += [os.path.join(root, j) for j in files \
        if re.match(fnmatch.translate(globexpression), j, re.IGNORECASE)]

Better also compile the regular expression, so instead of

re.match(fnmatch.translate(globexpression)

do (before the loop):

reg_expr = re.compile(fnmatch.translate(globexpression), re.IGNORECASE)

and then replace in the loop:

result += [os.path.join(root, j) for j in files if re.match(reg_expr, j)]

user459872 · Accepted Answer · 2023-12-18 11:26:20Z

6

Starting from python-3.12, you can use Path.glob(pattern, *, case_sensitive=None) API with case_sensitive argument set to False to get the desired behavior.

By default, or when the case_sensitive keyword-only argument is set to None, this method matches paths using platform-specific casing rules: typically, case-sensitive on POSIX, and case-insensitive on Windows. Set case_sensitive to True or False to override this behaviour.

answered Dec 18, 2023 at 11:26

user459872

24.3k4 gold badges44 silver badges64 bronze badges

Add a comment |

Timothy C. Quinn · Accepted Answer · 2020-08-11 17:14:25Z

3

Here is my non-recursive file search for Python with glob like behavior for Python 3.5+

# Eg: find_files('~/Downloads', '*.Xls', ignore_case=True)
def find_files(path: str, glob_pat: str, ignore_case: bool = False):
    rule = re.compile(fnmatch.translate(glob_pat), re.IGNORECASE) if ignore_case \
            else re.compile(fnmatch.translate(glob_pat))
    return [n for n in os.listdir(os.path.expanduser(path)) if rule.match(n)]

Note: This version handles home directory expansion

answered Aug 11, 2020 at 17:14

Timothy C. Quinn

4,4551 gold badge44 silver badges52 bronze badges

This works except one cannot use wildcards anywhere in the path, only in the file.
– Matthew Snyder
Commented Oct 12, 2020 at 16:37
@MatthewSnyder - Thanks. When I get some time, I'll update it to handle wildcards in path also.
– Timothy C. Quinn
Commented Oct 12, 2020 at 17:11

Add a comment |

HCLivess · Accepted Answer · 2017-03-16 10:12:44Z

1

Depending on your case, you might use .lower() on both file pattern and results from folder listing and only then compare the pattern with the filename

answered Mar 16, 2017 at 10:12

HCLivess

1,0751 gold badge13 silver badges22 bronze badges

Add a comment |

Matthew Snyder · Accepted Answer · 2020-10-12 22:09:08Z

1

Riffing off of @Timothy C. Quinn's answer, this modification allows the use of wildcards anywhere in the path. This is admittedly only case insensitive for the glob_pat argument.

import re
import os
import fnmatch
import glob

def find_files(path: str, glob_pat: str, ignore_case: bool = False):
    rule = re.compile(fnmatch.translate(glob_pat), re.IGNORECASE) if ignore_case \
            else re.compile(fnmatch.translate(glob_pat))
    return [n for n in glob.glob(os.path.join(path, '*')) if rule.match(n)]

edited Oct 12, 2020 at 22:09

answered Oct 12, 2020 at 16:54

Matthew Snyder

4334 silver badges13 bronze badges

Add a comment |

paradox · Accepted Answer · 2021-01-18 12:54:15Z

Here is a working example with fnmatch.translate():

from glob import glob
from pathlib import Path
import fnmatch, re


mask_str = '"*_*_yyww.TXT" | "*_yyww.TXT" | "*_*_yyww_*.TXT" | "*_yyww_*.TXT"'
masks_list = ["yyyy", "yy", "mmmmm", "mmm", "mm", "#d", "#w", "#m", "ww"]

for mask_item in masks_list:
    mask_str = mask_str.replace(mask_item, "*")

clean_quotes_and_spaces = mask_str.replace(" ", "").replace('"', '')
remove_double_star = clean_quotes_and_spaces.replace("**", "*")
masks = remove_double_star.split("|")

cwd = Path.cwd()

files = list(cwd.glob('*'))
print(files)

files_found = set()

for mask in masks:
    mask = re.compile(fnmatch.translate(mask), re.IGNORECASE)
    print(mask)

    for file in files:        
        if mask.match(str(file)):
            files_found.add(file)         

print(files_found)

Philip Kahn · Accepted Answer · 2021-08-10 20:36:16Z

I just wanted a variant of this where I only went case insensitive if I was specifying a file extension -- eg, I wanted ".jpg" and ".JPG" to be crawled the same. This is my variant:

import re
import glob
import os
from fnmatch import translate as regexGlob
from platform import system as getOS

def linuxGlob(globPattern:str) -> frozenset:
    """
    Glob with a case-insensitive file extension
    """
    base = set(glob.glob(globPattern, recursive= True))
    maybeExt = os.path.splitext(os.path.basename(globPattern))[1][1:]
    caseChange = set()
    # Now only try the extended insensitivity if we've got a file extension
    if len(maybeExt) > 0 and getOS() != "Windows":
        rule = re.compile(regexGlob(globPattern), re.IGNORECASE)
        endIndex = globPattern.find("*")
        if endIndex == -1:
            endIndex = len(globPattern)
        crawl = os.path.join(os.path.dirname(globPattern[:endIndex]), "**", "*")
        checkSet = set(glob.glob(crawl, recursive= True)) - base
        caseChange = set([x for x in checkSet if rule.match(x)])
    return frozenset(base.union(caseChange))

I didn't actually restrict the insensitivity to just the extension because I was lazy, but that confusion space is pretty small (eg, you'd want to capture FOO.jpg and FOO.JPG but not foo.JPG or foo.jpg; if your path is that pathological you've got other problems)

matsken · Accepted Answer · 2022-06-07 02:13:03Z

0

def insensitive_glob(pattern):
    def either(c):
        return '[%s%s]' % (c.lower(), c.upper()) if c.isalpha() else c
    return glob.glob(''.join(map(either, pattern)))

also can be:

def insensitive_glob(pattern):
    return glob.glob(
        ''.join([
            '[' + c.lower() + c.upper() + ']'
            if c.isalpha() else c
            for c in pattern
        ])
    )

edited Jun 7, 2022 at 2:13

answered Jun 7, 2022 at 2:05

matsken

11 bronze badge

Add a comment |

jean42pin · Accepted Answer · 2023-03-29 07:33:57Z

0

a variation of your answer with search recursive of names files :

def insensitive_for_glob(string_file):
    return ''.join(['[' + c.lower() + c.upper() + ']' if c.isalpha() else c for c in string_file])

in otherplace in code :

namefile = self.insensitive_for_glob(namefile)
lst_found_file = glob.glob(f'{file_path}/**/*{namefile}', recursive=True)

answered Mar 29, 2023 at 7:33

jean42pin

11 bronze badge

Add a comment |

rettentulla · Accepted Answer · 2023-07-14 08:08:11Z

0

You can simply search for upper and lower cases and then add the results like so:

from pathlib import Path folder = Path('some_folder')
file_filter = '.txt' 
files_in_folder = [files for files in (
    list(folder.glob(f'*{file_filter.lower()}'))+
    list(folder.glob(f'*{file_filter.upper()}'))
)]

This will find files both with .txt as well as .TXT endings.

answered Jul 14, 2023 at 8:08

rettentulla

1

You missed out a newline before "folder = Path('some_folder')"
– Alex Ramses
Commented May 21 at 3:12

Add a comment |

Collectives™ on Stack Overflow

Ignore case in glob() on Linux

12 Answers 12

Non recursively

Recursively

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
linux
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Non recursively

Recursively

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonlinux or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
linux
or ask your own question.