Find all python files linux

Find all python files in linux file system [duplicate]

How can I search my entire Linux filesystem for all python files (files with extension .py )? I know I can do find -name filename but how can I do it for file type?

3 Answers 3

-name ‘*.py’ since -name accepts a pattern.

Mind the single quotes.

sudo find / -name «*.py»

You only need sudo to avoid Permission denied s (since you’re searching from the root).

Not all python files will have the file extension .py — Try running grep -rni python /usr/bin for example). Most of these scripts will have a ‘shebang’ (or hashbang) line (e.g. #!/usr/bin/env python , #!/usr/bin/python2.7 ). This informs the interpreter of the script which program needs to be used to run it, and you could search for this to find python files

However, you can also use the file’s mimetype (usually along the lines of text/x-python ) to find it:

Where / is your intended search directory.

With find you could also add the -executable option to look for only executable files. Also the use of -type f restrict find to look only for files — you could change this and then show symbolic links etc as well (some scripts are contained in /usr/lib etc and symlinked to /usr/bin/ etc). Many more options are available, you can see these by running man find .

file should be able to guess the filetype even if the file has no extension etc (using the shebang line etc) — see here.

To removed any find: ‘/. /FILE’: Permission denied etc errors, you can run the script as root (using sudo bash -c «COMMAND» , opening a shell with sudo su etc), and/or simply append 2>/dev/null to the find command.

Источник

How to List Files in a Directory Using Python?

In this tutorial, we’re covering everything you need to know about how to list files in a directory using Python.

Python is a general-purpose language, used in a variety of fields like Data Science, Machine Learning, and even in Web Development. There seems to be no restriction in the application of Python Language.

Therefore, it seems quite trivial Python can be used to list files and directories in any system. The aim of this article is to illuminate the reader about the ways to list files in a system using Python.

List All Files in a Directory Using Python

For the purpose of interacting with directories in a system using Python, the os library is used.

1. Using the ‘os’ library

The method that we are going to exercise for our motive is listdir(). As the name suggests, it is used to list items in directories.

Output:

Linux users can easily match the above output using the standard ls command on the terminal.

Читайте также:  Windows license pro что это

As we can see the outputs of each method matches.

2. Using the ‘glob’ library

glob is mostly a filename pattern matching library, but it can be used to list items in the current directory by:

Output:

The wildcard character ‘*’ is used to match all the items in the current directory. Since we wish to display the items of the current directory, we need to switch off the recursive nature of glob() function.

3. List only files in the current directory

In the above methods, the python code was returning all the items in the current directory irrespective of their nature. We can extract only the files using the path.isfile() function inside the os library.

Output:

In the above code snippet, List Comprehension is used to filter out only those items that are actually a file.

Another key thing to note here is that, the above code does not work for other directories as the variable ‘f’ is not an absolute path, but a relative path to the current directory.

List All Files in a Directory Recursively

In order to print the files inside a directory and its subdirectories, we need to traverse them recursively.

1. Using the ‘os’ library

With the help of the walk() method, we can traverse each subdirectory within a directory one by one.

Output:

The os.walk() method simply follows each subdirectory and extracts the files in a top-down manner by default. There are three iterators used for going through the output of os.walk() function:

  • path – This variable contains the present directory the function is observing during a certain iteration
  • folders – This variable is a list of directories inside the ‘path’ directory.
  • files – A list of files inside the ‘path’ directory.

The join() method is used to concatenate the file name with its parent directory, providing us with the relative path to the file.

2. Using the ‘glob’ library

Similar to the above procedure, glob can recursively visit each directory and extract all items and return them.

Output:

The ‘**’ symbol used along with the path variable tells the glob() function to match files within any subdirectory. The ‘*’ tells the function to match with all the items within a directory.

Since we wish to extract only the files in the complete directory, we filter out the files using the isfile() function used before.

List All Subdirectories Inside a Directory

Instead of listing files, we can list all the subdirectories present in a specific directory.

Output:

The minor difference between listing files and directories is the selection of iterator during the process of os.walk() function. For files, we iterate over the files variable. Here, we loop over the folders variable.

List Files in a Directory with Absolute Path

Once we know how to list files in a directory, then displaying the absolute path is a piece of cake. The abspath() method provides us with the absolute path for a file.

Output:

One thing to note here is that abspath() must be provided with the relative path of the file and that is the purpose of join() function.

List Files in a Directory by Matching Patterns

There are multiple ways to filter out filenames matching a particular pattern. Let us go through each of them one by one.

1. Using the ‘fnmatch’ library

As the name suggests, fnmatch is a filename pattern matching library. Using fnmatch with our standard filename extracting libraries, we can filter out those files matching a specific pattern.

Output:

The fnmatch() function takes in two parameters, the filename followed by the pattern to be matched. In the above code, we are looking at all the files containing the word file in them.

Читайте также:  Как протестировать сборку windows

2. Using the ‘glob’ library

As we mentioned before, glob’s primary purpose is filename pattern matching.

Output:

The above pattern matching regular expression ‘**/*3*.*’ can be explained as:

  • ‘**’ – Traverse all subdirectories inside the path
  • ‘/*’ – The filename can start with any character
  • ‘6’ – Contains a number within its filename
  • ‘*.*’ – The filename can end with any character and can have any extension

3. Using the ‘pathlib’ library

pathlib follows an object-oriented way of interacting with the filesystem. The rglob() function inside the library can be used to recursively extract list of files through a certain Path object.

These list of files can be filtered using a pattern within the rglob() function.

Output:

The above code snippet is used to list all the files starting with the letter ‘m’ .

List Files in a Directory with a Specific Extension

Listing files with a specific extension in Python is somewhat similar to pattern matching. For this purpose, we need to create a pattern with respect to the file extension.

Output:

The fnmatch() function filters out those files ending with ‘.py’ , that is python files. If we want to extract files with different extensions, then we have to alter this part of the code. For example, in order to fetch only C++ files, ‘.cpp’ must be used.

This sums up the ways to fetch list of files in a directory using Python.

Conclusion

There can be multiple ways to solve any problem at hand, and the most convenient one is not always the answer. With respect to this article, a Python programmer must be aware of every way we can list files in a directory.

We hope this article was easy to follow. Feel free to comment below for any queries or suggestions.

Источник

Find all files in a directory with extension .txt in Python

How can I find all the files in a directory having the extension .txt in python?

26 Answers 26

or if you want to traverse directory, use os.walk :

Something like that should do the job

Something like this will work:

You can simply use pathlib s glob 1 :

If you want it recursive you can use .glob(‘**/*.txt)

1 The pathlib module was included in the standard library in python 3.4. But you can install back-ports of that module even on older Python versions (i.e. using conda or pip ): pathlib and pathlib2 .

Or with generators:

Here’s more versions of the same that produce slightly different results:

glob.iglob()

glob.glob1()

fnmatch.filter()

Python v3.5+

Fast method using os.scandir in a recursive function. Searches for all files with a specified extension in folder and sub-folders. It is fast, even for finding 10,000s of files.

I have also included a function to convert the output to a Pandas Dataframe.

Try this this will find all your files recursively:

To get all ‘.txt’ file names inside ‘dataPath’ folder as a list in a Pythonic way:

Python has all tools to do this:

I did a test (Python 3.6.4, W7x64) to see which solution is the fastest for one folder, no subdirectories, to get a list of complete file paths for files with a specific extension.

Источник

Lee On Coding

My blog about coding and stuff.

How does python find packages?

I just ran into a situation where I compiled and installed Python 2.7.9 from source on Ubuntu, but Python could not find the packages I had previously installed. This naturally raises the question — how does Python know where to find packages when you call import ? This post applies specifically to Python 2.7.9, but I’m guessing Python 3x works very similarly.

In this post I first describe how Python finds packages, and then I’ll finish with the discovery I made regarding the default Python that ships with Ubuntu and how it differs from vanilla Python in how it finds packages.

Читайте также:  Task schedulers in linux

sys.path

Python imports work by searching the directories listed in sys.path .

Using my default Ubuntu 14.04 Python:

So Python will find any packages that have been installed to those locations.

How sys.path gets populated

As the docs explain, sys.path is populated using the current working directory, followed by directories listed in your PYTHONPATH environment variable, followed by installation-dependent default paths, which are controlled by the site module.

You can read more about sys.path in the Python docs.

Assuming your PYTHONPATH environment variable is not set, sys.path will consist of the current working directory plus any manipulations made to it by the site module.

The site module is automatically imported when you start Python, you can read more about how it manipulates your sys.path in the Python docs.

It’s a bit involved.

You can manipulate sys.path

You can manipulate sys.path during a Python session and this will change how Python finds modules. For example:

The module __file__ attribute

When you import a module, you usually can check the __file__ attribute of the module to see where the module is in your filesystem:

However, the Python docs state that:

The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

So, for example this doesn’t work:

It makes sense that the sys module is statically linked to the interpreter — it is essentially part of the interpreter!

The imp module

Python exposes the entire import system through the imp module. That’s pretty cool that all of this stuff is exposed for us to abuse, if we wanted to.

imp.find_module can be used to find a module:

You can also import and arbitrary Python source as a module using imp.load_source . This is the same example before, except imports our module using imp instead of by manipulating sys.path :

Passing ‘hi’ to imp.load_source simply sets the __name__ attribute of the module.

Ubuntu Python

Now back to the issue of missing packages after installing a new version of Python compiled from source. By comparing the sys.path from both the Ubuntu Python, which resides at /usr/bin/python , and the newly installed Python, which resides at /usr/local/bin/python , I could sort things out:

Ubuntu Python ( /usr/bin/python ):

Python compiled from source ( /usr/local/bin/python )

Turns out what mattered for me was dist-packages vs. site-packages . Using Ubuntu’s Python, my packages were installed to /usr/local/lib/python2.7/dist-packages , whereas the new Python I installed expects packages to be installed to /usr/local/lib/python2.7/site-packages . I just had to manipulate the PYTHONPATH environment variable to point to dist-packages in order to gain access to the previously installed packaged with the newly installed version of Python.

How did Ubuntu manipulate the sys.path ?

So how does the Ubuntu distribution of Python know to use /usr/local/lib/python2.7/dist-packages in sys.path ? It’s hardcoded into their site module! First, find where the site module code lives:

Here is an excerpt from Ubuntu Python’s site.py , which I peeked by opening /usr/lib/python2.7/site.py in a text editor. First, a comment at the top:

For Debian and derivatives, this sys.path is augmented with directories for packages distributed within the distribution. Local addons go into /usr/local/lib/python /dist-packages, Debian addons install into /usr//python /dist-packages. /usr/lib/python /site-packages is not used.

OK so there you have it. They explain how the Debian distribution of Python is different.

And now, for the code that implementes this change:

It’s all there, if you are crazy enough to dig this deep.

© Lee Mendelowitz – Built with Pure Theme for Pelican

Источник

Оцените статью