Go Back   The macosxhints Forums > OS X Help Requests > UNIX - General



Reply
 
Thread Tools Rate Thread Display Modes
Old 11-09-2009, 12:03 AM   #1
loren_ryter
Triple-A Player
 
Join Date: Jan 2002
Posts: 110
count directory files quickly for massive directories?

I need to be able to get a file count of a directory that may contain > 10,000 files.

Everything I've seen, including this hint here basically suggests piping ls to wc -l.

Unfortunately in a directory that large, the command:

Code:
ls /path/dir/ | wc -l
.. just hangs.

Is there no better, faster way to get this information without counting lines of output from unix?

I'd even be satisfied if it could test if there were more than X files and just give up instead of trying to get a count and hanging.
loren_ryter is offline   Reply With Quote
Old 11-09-2009, 09:43 AM   #2
fracai
All Star
 
Join Date: May 2004
Posts: 912
You might try something using "find" instead.
I found this to be quite a bit faster than piping ls.

Code:
find /path/dir/ -maxdepth 1 -mindepth 1  | wc -l
fracai is offline   Reply With Quote
Old 11-09-2009, 09:56 AM   #3
fracai
All Star
 
Join Date: May 2004
Posts: 912
My quick test:
Code:
$ mkdir ~/test/dir/

$ for F in `seq -w 0 100000` ; do touch ~/test/dir/"$F" ; done

$ time ls ~/test/dir/ | wc -l
100001

real	0m1.448s
user	0m0.716s
sys	0m0.608s

$ time find ~/test/dir/ -maxdepth 1 -mindepth 1 | wc -l
100001

real	0m0.265s
user	0m0.088s
sys	0m0.104s
Odd that your system is hanging on this.

Last edited by fracai; 11-09-2009 at 09:59 AM.
fracai is offline   Reply With Quote
Old 11-09-2009, 12:05 PM   #4
hayne
Moderator
 
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
The fact your system is hanging on that command suggests that something else is wrong. Maybe you have filesystem corruption or maybe some of the filenames in that directory are causing problems.
__________________
hayne.net/macosx.html
hayne is offline   Reply With Quote
Old 11-09-2009, 12:20 PM   #5
brettgrant99
Major Leaguer
 
Join Date: May 2007
Posts: 460
I tend to use tcsh and haven't used bash too much, so with that in mind, in tcsh ...

Code:
set xx = *
echo $#xx
will get you the number of files that match '*'.

Delving into things that I know nothing about, I believe that there are some kernal variables that control how much memory is allowed to be used for some of this stuff.

You could also try:
Code:
echo * | xargs ls -1 | wc -l
although I am unsure if that will issued to proper number, but if you want to do something to all of those files (for example run a sed or awk command) that works well too, especially if you get an Argument List too long error.

HTH,
Brett
brettgrant99 is offline   Reply With Quote
Old 11-09-2009, 12:30 PM   #6
fracai
All Star
 
Join Date: May 2004
Posts: 912
"echo *" may run up against argument limits. Additionally, piping echo into xargs is redundant; ls * achieves the same thing and doesn't choke on files containing spaces.
fracai is offline   Reply With Quote
Old 11-09-2009, 05:51 PM   #7
brettgrant99
Major Leaguer
 
Join Date: May 2007
Posts: 460
I have never had an issue where the wildcard character is only *.

However, I have had directories where you could:

ls * and get output,but
ls *.txt will have a aurgument list too long issues

using echo *.txt | xargs ls will work, where ls *.txt or find . -name '*.txt' won't.

As a side note, you should be careful with large directories. In unix a directory is simply a file with a list of the files that it should contain. I have seen people try to put too many files in a directory and not be able to retrieve them because the number of characters of all the filenames exceeds an allowable file size. This allowable size seems to be compiled into the kernal, so it may be different on different machines. It may not happen in SL, but I have seen it happen in Tiger, Leopard, Solaris, and RH Linux.

Again, I am only saying things that I have personally experienced, it doesn't mean that I am correct.

HTH,
Brett
brettgrant99 is offline   Reply With Quote
Old 11-09-2009, 06:03 PM   #8
Hal Itosis
MVP
 
Join Date: Apr 2002
Posts: 2,112
man3 is a fairly hefty directory everyone here can easily test...
Code:
$ time ls /usr/share/man/man3 |wc -l
    5104

real	0m0.027s
user	0m0.019s
sys	0m0.012s
Hal Itosis is offline   Reply With Quote
Old 11-09-2009, 07:11 PM   #9
hayne
Moderator
 
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
The wallclock ("real") time for that command will depend strongly on what is in your disk cache (as well as on the speed of your disk and the speed of your CPU of course).
Compare the numbers I got on my MacBook Pro in initial and subsequent runs, both before and after I did a 'purge':
Code:
$ time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.911s
user	0m0.144s
sys	0m0.045s

% time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.152s
user	0m0.140s
sys	0m0.014s

% time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.155s
user	0m0.144s
sys	0m0.014s

% /usr/bin/purge

% time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.586s
user	0m0.144s
sys	0m0.020s

% time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.153s
user	0m0.136s
sys	0m0.009s

% time ls /usr/share/man/man3 |wc -l
    6352

real	0m0.149s
user	0m0.138s
sys	0m0.013s
__________________
hayne.net/macosx.html

Last edited by hayne; 11-09-2009 at 07:14 PM.
hayne is offline   Reply With Quote
Old 11-10-2009, 12:15 AM   #10
loren_ryter
Triple-A Player
 
Join Date: Jan 2002
Posts: 110
Interesting results guys. I get similar results. I was looking for an answer for a user who has that many files in an "Unknown Album" directory of iTunes (ie, untagged files) -- and guess what -- it's iTunes that puts them all there. Not best practices, true.

Based on these results, he may have a failing hard drive.

Still, I'm a bit surprised that there is nothing internal to the file system that maintains counts of files in a directory that can be accessed from the command line.

Thanks for all the reports.
loren_ryter is offline   Reply With Quote
Old 11-10-2009, 12:59 AM   #11
hayne
Moderator
 
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
Quote:
Originally Posted by loren_ryter
Based on these results, he may have a failing hard drive.

Perhaps - but (as I said above), it might just as easily be a problem with filesystem corruption (a software rather than a hardware problem).
__________________
hayne.net/macosx.html
hayne is offline   Reply With Quote
Old 11-10-2009, 01:17 AM   #12
Hal Itosis
MVP
 
Join Date: Apr 2002
Posts: 2,112
Quote:
Originally Posted by hayne
The wallclock ("real") time for that command will depend strongly on what is in your disk cache (as well as on the speed of your disk and the speed of your CPU of course).

Good catch... okay. Here is mine again, moments after restart/login (just installed SecUpd2009-006):
Code:
$ time ls /usr/share/man/man3 |wc -l
    5104

real	0m0.323s
user	0m0.111s
sys	0m0.035s
Hmm... also, you have 6352 items compared to my 5104.
[no /Developer tools for me this time... maybe later.]
Hal Itosis is offline   Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 11:36 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Site design © Mac Publishing LLC; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of Mac Publishing LLC.