Linux counting files recursively

Recursively Counting files by Extension on Mac or Linux

Back in 2004 I wrote up a blog entry showing how to get a count of files by a specific extension. For example you want to know how many js files are in a directory, you can run this:

The -c in grep tells it to count the matches, I’m using fgrep here because I’m not using a regex (to avoid escaping the dot).

The above would also match a file, or a directory had .js anywhere in the path, so we could improve that script by using a regular expression $ character, for example:

Now we are limiting the .js to show up only at the end of the file.

What if you want a listing of all file extensions and the count of files in a directory?

Here’s one way to print out a list of extensions and the number of files of each type:

This will print out a nice list like this:

How it works

First we have find /some/dir -type f which just limits find to output all the files in the directory recursively. The -type f omits directories from showing up in the list.

Next we have grep -o «.[^.]\+$» the -o tells grep to only output lines that match the pattern, and only output the match. The pattern is just a regex that says look for a dot followed by one or more chars that are not a dot [^.]\+ , at the end of a line $ .

Next we pipe into the sort command which just puts every thing in order.

Finally we pipe into uniq -c which counts each unique line (the file extensions) and prints out the results. Cool!

Like this? Follow me ↯

Recursively Counting files by Extension on Mac or Linux was first published on October 09, 2019.

If you like reading about unix, linux, bash, mac, grep, sort, or uniq then you might also like:

Comments

find -type f | grep -o «.[^./]\+$» | sort | uniq -c | sort -n

include / in [^./] to exclude results with no file extension but . in a directory name in the path

Источник

Recursively count all the files in a directory [duplicate]

I have a really deep directory tree on my Linux box. I would like to count all of the files in that path, including all of the subdirectories.

Читайте также:  Обзор системы linux кратко

For instance, given this directory tree:

If I pass in /home , I would like for it to return four files. Or, bonus points if it returns four files and two directories. Basically, I want the equivalent of right-clicking a folder on Windows and selecting properties and seeing how many files/folders are contained in that folder.

How can I most easily do this? I have a solution involving a Python script I wrote, but why isn’t this as easy as running ls | wc or similar?

5 Answers 5

find . -type f | wc -l

Explanation:
find . -type f finds all files ( -type f ) in this ( . ) directory and in all sub directories, the filenames are then printed to standard out one per line.

This is then piped | into wc (word count) the -l option tells wc to only count lines of its input.

Together they count all your files.

The answers above already answer the question, but I’ll add that if you use find without arguments (except for the folder where you want the search to happen) as in:

the search goes much faster, almost instantaneous, or at least it does for me. This is because the type clause has to run a stat() system call on each name to check its type — omitting it avoids doing so.

This has the difference of returning the count of files plus folders instead of only files, but at least for me it’s enough since I mostly use this to find which folders have huge ammounts of files that take forever to copy and compress them. Counting folders still allows me to find the folders with most files, I need more speed than precision.

Источник

How do I count all the files recursively through directories

I want to see how many files are in subdirectories to find out where all the inode usage is on the system. Kind of like I would do this for space usage

which will give me the space used in the directories off of root, but in this case I want the number of files, not the size.

10 Answers 10

Thanks to Gilles and xenoterracide for safety/compatibility fixes.

The first part: find . -maxdepth 1 -type d will return a list of all directories in the current working directory. (Warning: -maxdepth is a GNU extension and might not be present in non-GNU versions of find .) This is piped to.

The second part: while read -r dir; do (shown above as while read -r dir (newline) do ) begins a while loop – as long as the pipe coming into the while is open (which is until the entire list of directories is sent), the read command will place the next line into the variable dir . Then it continues.

Читайте также:  Realtek ac97 sound driver windows 10

The third part: printf «%s:\t» «$dir» will print the string in $dir (which is holding one of the directory names) followed by a colon and a tab (but not a newline).

The fourth part: find «$dir» -type f makes a list of all the files inside the directory whose name is held in $dir . This list is sent to.

The fifth part: wc -l counts the number of lines that are sent into its standard input.

The final part: done simply ends the while loop.

So we get a list of all the directories in the current directory. For each of those directories, we generate a list of all the files in it so that we can count them all using wc -l . The result will look like:

Источник

Recursively count files in all nested Subdirectories in a Linux [duplicate]

How can I cound recursively number of files in a subdirectry in a Linux system. I know

These commands with really nice and informative output but does not cound files. I was trying to

But fail. Output like this:

There is a spaces and special characters in directory names

1 Answer 1

After researching and testing i’ve got

Explanation

-maxdepth 1 — I need only one level of recursion

-type d — only directories

-print0 | while IFS= read -r -d » i — I have spaces in directories. The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines or to escape the delimiters). Without this option, any unescaped backslashes in the input will be discarded. You should almost always use the -r option with read .

The most common exception to this rule is when -e is used, which uses Readline to obtain the line from an interactive shell. In that case, tab completion will add backslashes to escape spaces and such, and you do not want them to be literally included in the variable. This would never be used when reading anything line-by-line, though, and -r should always be used when doing so.

By default, read modifies each line read, by removing all leading and trailing whitespace characters (spaces and tabs, if present in IFS ). If that is not desired, the IFS variable may be cleared, as in the example above.

The IFS variable is used in shells (Bourne, POSIX, ksh, bash) as the input field separator (or internal field separator). Essentially, it is a string of special characters which are to be treated as delimiters between words/fields when splitting a line of input.

The default value of IFS is space, tab, newline. (A three-character string.) If IFS is unset, it acts as though it were set to this default value. (This is presumably for simplicity in shells that do not support the $’. ‘ syntax for special characters.) If IFS is set to an empty string (which is very different from unsetting it!) then no splitting will be performed.

Читайте также:  Выберите раздел для установки windows расширенный

In the read command, if multiple variable-name arguments are specified, IFS is used to split the line of input so that each variable gets a single field of the input. (The last variable gets all the remaining fields, if there are more fields than variables.)

sort -z — sort output of find in alfabetical order

do echo -n «$i: » — print directory name and colon

find «$i» -type f — find files only inside each directory

wc -l — display number of files (lines of second find output)

Источник

Fast way to recursively count files in linux

I’m using the following to count the number of files in a directory, and its subdirectories:

But I have half a million files in there, and the count takes a long time.

Is there a faster way to get a count of the number of files, that doesn’t involve piping a huge amount of text to something that counts lines? It seems like an inefficient way to do things.

7 Answers 7

If you have this on a dedicated file-system, or you have a steady number of files overhead, you may be able to get a rough enough count of the number of files by looking at the number of inodes in the file-system via «df -i»:

On my test box above I have 75,885 inodes allocated. However, these inodes are not just files, they are also directories. For example:

NOTE: Not all file-systems maintain inode counts the same way. ext2/3/4 will all work, however btrfs always reports 0.

If you have to differentiate files from directories, you’re going to have to walk the file-system and «stat» each one to see if it’s a file, directory, sym-link, etc. The biggest issue here is not the piping of all the text to «wc», but seeking around among all the inodes and directory entries to put that data together.

Other than the inode table as shown by «df -i», there really is no database of how many files there are under a given directory. However, if this information is important to you, you could create and maintain such a database by having your programs increment a number when they create a file in this directory and decrement it when deleted. If you don’t control the programs that create them, this isn’t an option.

Источник

Оцените статью