|
|
#1 |
|
Triple-A Player
Join Date: Jan 2002
Posts: 110
|
count directory files quickly for massive directories?
I need to be able to get a file count of a directory that may contain > 10,000 files.
Everything I've seen, including this hint here basically suggests piping ls to wc -l. Unfortunately in a directory that large, the command: Code:
ls /path/dir/ | wc -l Is there no better, faster way to get this information without counting lines of output from unix? I'd even be satisfied if it could test if there were more than X files and just give up instead of trying to get a count and hanging. |
|
|
|
|
|
#2 |
|
All Star
Join Date: May 2004
Posts: 912
|
You might try something using "find" instead.
I found this to be quite a bit faster than piping ls. Code:
find /path/dir/ -maxdepth 1 -mindepth 1 | wc -l |
|
|
|
|
|
#3 |
|
All Star
Join Date: May 2004
Posts: 912
|
My quick test:
Code:
$ mkdir ~/test/dir/ $ for F in `seq -w 0 100000` ; do touch ~/test/dir/"$F" ; done $ time ls ~/test/dir/ | wc -l 100001 real 0m1.448s user 0m0.716s sys 0m0.608s $ time find ~/test/dir/ -maxdepth 1 -mindepth 1 | wc -l 100001 real 0m0.265s user 0m0.088s sys 0m0.104s Last edited by fracai; 11-09-2009 at 09:59 AM. |
|
|
|
|
|
#4 |
|
Moderator
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
|
The fact your system is hanging on that command suggests that something else is wrong. Maybe you have filesystem corruption or maybe some of the filenames in that directory are causing problems.
__________________
hayne.net/macosx.html |
|
|
|
|
|
#5 |
|
Major Leaguer
Join Date: May 2007
Posts: 460
|
I tend to use tcsh and haven't used bash too much, so with that in mind, in tcsh ...
Code:
set xx = * echo $#xx Delving into things that I know nothing about, I believe that there are some kernal variables that control how much memory is allowed to be used for some of this stuff. You could also try: Code:
echo * | xargs ls -1 | wc -l HTH, Brett |
|
|
|
|
|
#6 |
|
All Star
Join Date: May 2004
Posts: 912
|
"echo *" may run up against argument limits. Additionally, piping echo into xargs is redundant; ls * achieves the same thing and doesn't choke on files containing spaces.
|
|
|
|
|
|
#7 |
|
Major Leaguer
Join Date: May 2007
Posts: 460
|
I have never had an issue where the wildcard character is only *.
However, I have had directories where you could: ls * and get output,but ls *.txt will have a aurgument list too long issues using echo *.txt | xargs ls will work, where ls *.txt or find . -name '*.txt' won't. As a side note, you should be careful with large directories. In unix a directory is simply a file with a list of the files that it should contain. I have seen people try to put too many files in a directory and not be able to retrieve them because the number of characters of all the filenames exceeds an allowable file size. This allowable size seems to be compiled into the kernal, so it may be different on different machines. It may not happen in SL, but I have seen it happen in Tiger, Leopard, Solaris, and RH Linux. Again, I am only saying things that I have personally experienced, it doesn't mean that I am correct. HTH, Brett |
|
|
|
|
|
#8 |
|
MVP
Join Date: Apr 2002
Posts: 2,112
|
man3 is a fairly hefty directory everyone here can easily test...
Code:
$ time ls /usr/share/man/man3 |wc -l
5104
real 0m0.027s
user 0m0.019s
sys 0m0.012s
|
|
|
|
|
|
#9 |
|
Moderator
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
|
The wallclock ("real") time for that command will depend strongly on what is in your disk cache (as well as on the speed of your disk and the speed of your CPU of course).
Compare the numbers I got on my MacBook Pro in initial and subsequent runs, both before and after I did a 'purge': Code:
$ time ls /usr/share/man/man3 |wc -l
6352
real 0m0.911s
user 0m0.144s
sys 0m0.045s
% time ls /usr/share/man/man3 |wc -l
6352
real 0m0.152s
user 0m0.140s
sys 0m0.014s
% time ls /usr/share/man/man3 |wc -l
6352
real 0m0.155s
user 0m0.144s
sys 0m0.014s
% /usr/bin/purge
% time ls /usr/share/man/man3 |wc -l
6352
real 0m0.586s
user 0m0.144s
sys 0m0.020s
% time ls /usr/share/man/man3 |wc -l
6352
real 0m0.153s
user 0m0.136s
sys 0m0.009s
% time ls /usr/share/man/man3 |wc -l
6352
real 0m0.149s
user 0m0.138s
sys 0m0.013s
__________________
hayne.net/macosx.html Last edited by hayne; 11-09-2009 at 07:14 PM. |
|
|
|
|
|
#10 |
|
Triple-A Player
Join Date: Jan 2002
Posts: 110
|
Interesting results guys. I get similar results. I was looking for an answer for a user who has that many files in an "Unknown Album" directory of iTunes (ie, untagged files) -- and guess what -- it's iTunes that puts them all there. Not best practices, true.
Based on these results, he may have a failing hard drive. Still, I'm a bit surprised that there is nothing internal to the file system that maintains counts of files in a directory that can be accessed from the command line. Thanks for all the reports. |
|
|
|
|
|
#11 | |||||||||||||||||||||||
|
Moderator
Join Date: Jan 2002
Location: Montreal
Posts: 29,276
|
Perhaps - but (as I said above), it might just as easily be a problem with filesystem corruption (a software rather than a hardware problem).
__________________
hayne.net/macosx.html |
|||||||||||||||||||||||
|
|
|
|
|
#12 | |||||||||||||||||||||||
|
MVP
Join Date: Apr 2002
Posts: 2,112
|
Good catch... okay. Here is mine again, moments after restart/login (just installed SecUpd2009-006): Code:
$ time ls /usr/share/man/man3 |wc -l
5104
real 0m0.323s
user 0m0.111s
sys 0m0.035s
[no /Developer tools for me this time... maybe later.] |
|||||||||||||||||||||||
|
|
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|