View Full Version : new filesystem (more than a feature)
drjones
01-24-2002, 05:52 PM
In the pre-HisJobsness dark days of 1997 I remember an article by Henry Norr, no less, detailing how with a mere $1300 or so worth of Adaptec-RAID+two hot SCSI drives, one could almost make up for the 8.6 HFS+ filesystem's sloowness.
Then I heard that we would get NEXT-Rhapcetera for a new OS. I fully realised this was exactly the best thing and just knew it would also fix the filesystem's non-snappiness.
OK, we do have the best OS in so many ways, but speed ain't one. This piece of the OS isn't coming tomorrow either, metadata is just one part. Thing is, the new OS is here and we all want to be X-only, but even if Photoshop and my_Special_App go CocoaCarbo on Tuesday, they will be running in a piece of OS 7-8-9. Legacy is neccessary, or we would be Be, which has ceased to Be.
But this is still too important. to see why, look at Scot Hacker's article, the part that deals with the FS:
http://www.osnews.com/story.php?news_id=421&page=13
journaling
user-defined attributes
filesystem that is based on a database
dropjaw speed for file access and searches
These to me are marks of a real modern operating system. We have a great deal so far. We need the rest. I believe this is worth some time to get. I plan to visit the feedback site a lot. ;)
sorry for the long first post
Originally posted by drjones
journaling
This is both easy and hard, all at the same time. Not every FS will journal. Ex. VFAT can never be made to support journalling, the fs structures are too simple and cannot support any metadata. I have looked at the HFS+ headers and structures, but I can't make enough sense of it to see how to plug in a journal (a la ext3, which is "ext2 with journalling"), I assume you'd have to rewrite the FS from scratch as a kext and there would be no easy migration path (again, like ext2->ext3, where you insert the module and then run one command; if for some reason the module fails to load, you always fall back safely to ext2, obviously sans journalling but without risk of data corruption.) Someone could always pay namesys to port reiserfs, too.
filesystem that is based on a database
can someone PLEASE explain to me why this even matters? OK, I'm seriously not flaming you or anyone, but I tend to see this concept get bandied about a LOT by people who have never used (read: programmed for) any real database other than something like Access. The idea of filesystem as abstract datastore is just DUMB. Any filesystem is just 3 simple things: a stream of bytes (programs and data), a structure to manage user interaction (metadata, which is a stream of bytes itself, although the format is restricted), and a structure to support kernel-level interaction (on Unix it's inodes, for example). That's it. How any sort of relational or object-relational structure ("database") would improve FS performance (from either a kernel or userspace perspective) is beyond me. Be's solution is (IMHO) the correct one: leave the FS structure alone, improve the metadata. With an adequate interface to moderately free-form metadata you can get almost all the features of a database (queries) with none of the drawbacks (godawful performance, complexity, and fragility). Metadata - data about data -exists to provide this kind of interface.
drjones
01-25-2002, 11:21 AM
Opie
I really apreciate your input. I don't even use any of the various "*base" DB's, my desire is all about getting rid of any hidden things that make hot hardware not so hot. So I retract the DB part, true, the user-defined attributes are all that is needed.
And if the journaling is radical to implement, it's not coming near-term. It's less of a need with a robust OS anyway.
BTW, I read your article. Something deep in me resonates with your outlook, but I'm such a punk compared to some that seem to pick up the *nix side of things more easily. I've installed LPPC several times, but never took to it. Now with X, Fink, XDarwin 4.20, and OroborOSX, I'm thinking I'll keep plugging away. I have a need to know, and a job with spare time. But, like Morpheus said to Neo, "Seeing the truth, as I have, and living in it, are two different things"
DJ
percy
02-14-2002, 05:37 AM
Why would it be better to have a filesystem that keeps a database on all the files residing on the hard drive? You asked why it would be more efficient for the user. For one thing, it would be like searching in iTunes compared to searching with Sherlock. Every search (if you search on name, date, file type, anything that's in the database) would be almost instantaneous. My impression of Sherlock in MacOS X is not a feeling of things happening instantaneous. Another very good part is that many attributes could be kept abstracted from the file (a stupid example would be to keep a preview of a file in the database, instead of in the file), making cataloguing these things very, very much faster.
This was just a few examples. Read about some features of BFS, for instance, and the implementation of it in BeOS. Sure, the speed of that journaling filesystem was more thanks to the programmers optimizing the drivers, but there were many other Good Things© about BFS.
Originally posted by percy
Another very good part is that many attributes could be kept abstracted from the file (a stupid example would be to keep a preview of a file in the database, instead of in the file), making cataloguing these things very, very much faster.
Nope. In any relational system, performance is always going to suck. There is a lot of attention focused on TPC and all that; but compare the overhead of a single query on a small dataset in Oracle with a big query. There is always "start up" time.
Now, OK, database doesn't necessarily equal relational. There are many different ways to implement "databases" but generally speaking, anything that isn't relational or a hash-type (gdbm for example) is just "cached metadata" with hooks and optimizations to speed up lookups.
Besides... let's look at your example (which isn't stupid). There is no need to store a preview: the file already has the preview. The file is the preview. images can be scaled, text is one (well, a couple) of function call away from being previewed, the list goes on. So, have the VFS just cache this stuff somewhere. Make lookups for cached data speedy. Call that a "database" if you want. It reeks of buzzword-compliance if you ask me.
I agree that BFS was pretty damn awesome.. but you'll note that they ditched the database and went with optimized metadata, which is what I have been saying all along.
Microsoft is going with a filesystem based on SQL Server.. this is going to bring this argument into the forefront, again. Specifically, OSX and Linux will be chided, flamed, and lambasted because they don't have RDBMS-based file systems (as a user of both, I feel the need to defend both!). My personal belief is its more of a platform lock-on move than a real innovation. YMMV.
percy
02-15-2002, 04:22 AM
Okay. You may be perfectly right Opie. Maybe it's more buzzword compliance than a Real Thing©. I sort of skimmed through the database part of my programming course, so I can't really explain exactly what I mean. My only point was, a query in iTunes is a lot faster than a query in Sherlock.
For the image preview thing. Making a temporary preview of a picture takes time. This is going to be less of a problem as computers get even faster, but right now, it takes time. The time to preview one image might not be long, but take a thousand images, and having an already created preview makes sense.
johnq
02-15-2002, 03:24 PM
My only point was, a query in iTunes is a lot faster than a query in Sherlock.
Sure but Percy, you aren't being fair to what's involved.
You're leaving out the overhead that iTunes has when it imports files into the Library. It's analogous to Sherlock indexing a drive - overhead that helps speed searching.
Plus you aren't accounting for the fact that iTunes is operating on a known, finite set of items.
Without limting the seach in Sherlock beforehand (like "search for just MP3s and only in this one folder") and by not indexing a drive (costly overhead) poor Sherlock doesn't know to ignore the 100,000 files that might not apply to your search.
Since iTunes is searching both a known, smaller set of files but a larger, known set of criteria, I'd say it's hardly fair at all to say iTunes is faster.
It's like saying you can drink 3 shots of whiskey before I can drink a whole bottle. :)
percy
02-16-2002, 04:15 AM
Of course it's not fair. What I've been talking about is a filesystem that automatically keeps an updated journal of every file that resides on your hard drive. This isn't the case with HFS+.
If we'd have a filesystem that knew exactly what files were on the hard drive, when they got changed, what changes, everything, we'd not have the problem you are describing. Sure we'd have an overhead, due to indexing and more writing, but it's possible to optimize the drivers to the extent that this won't matter. If this was the case, it wouldn't matter to Sherlock that it didn't know that it was a finite set of items I wanted it to search. I could just say, search for all files with this attribute (file type mp3) and with this name, or this modification date, or this length, or this author, or whatever.
Craig R. Arko
02-16-2002, 10:08 AM
Doesn't the 'locate' DB kind of do this already? It does not update in realtime, of course, but does as part of the daily (weekly?) cron tasks.
johnq
02-16-2002, 05:36 PM
Crarko, hehe, shh! I use locate far more frequently than Sherlock; not as neat but usually what I need: speed and whereizzit.
Percy, ok I see what you mean now.
But by "it would be like searching in iTunes compared to searching with Sherlock" you meant a non-indexed Sherlock, right?
If both use indexing and both are made to search a folder, I assume speed's probably never an issue. But I'm not sure what Sherlock includes in it's indexing scheme. Does it index every criteria that the advanced panel shows? Or just location and filename? I know iTunes is indexing the "extra" ID3 stuff, similarly.
Well, that's all besides your point anyway, I know.
percy
02-18-2002, 09:06 AM
That's the thing with Sherlock. It indexes the contents of a file. That's why it takes forever. I don't know what else it indexes though.
xeroply
03-11-2002, 01:05 AM
OK, so who is going to write us a VFS module to support BeFS in Mac OS X? I expect to see one by next week :D
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.