Hexinverter ¬ Project Idea: Tag Filesystem for Archives and Libraries

I'm a bit obsessive when it comes to collecting and storing things, and the digital realm makes that much easier of course. The sheer density and breadth and availability of digital artifacts and their mobility makes this a rewarding avenue for my collect lust, as well as my avocation of archiving.

In any case, doing this has highlighted something which is distinctly missing from Unix-like systems, a categorical filesystem.

I realize this could be accomplished with some software that runs over the filesystem level, a database which maps artifacts to categories or tags and allows searching/browsing, however the straight-forward filesystem semantics are not available in this context, and to be honest to me it seems significantly more heavyweight than necessary.

I also realize that such things have been created in the past but they are not widely available (in that there's nothing I can install on my system right now).

So the basic idea is that artifacts are stored in the normal manner, perhaps on an existing filesystem. Some form of metadata is associated with each file (perhaps with a special file, or using filesystem extended file attributes) which attributes each artifact to one or more categories. Then a service, daemon, or some such (the standard methods are FUSE and mapping NFS or SMB) reads this metadata and creates essentially a virtual filesystem with directories representing categories, containing the artifacts within. Categories could be heirarchical.

Filesystem semantics would be mapped to manipulating the tags[categories] associated with the artifacts. For example cp could map to adding categories to the file, rm could map to removing a category, mv could map to removing all categoies and adding one, and similar things. Permissions would be handled similar to static top-level permissions in for example vfat (as mount parameters).

My use case is a digital library containing thousands of books. We could add categories like ASN, ISBN, author last name, containing the specific subcategories with values associated with each artifact, in addition to having generic categories such as scifi, fantasy, computers, and the like. Finding a particular book would be as easy as going to the ISBN, or the title. Locating a book by subject or browsing by author trivial. Using normal filesystem tools like find and locate makes searching trivial. My dream world, essentially.

Technologically this is something that is completely doable, the major labor is of course creating the matadata, but it is automatable and can be done over time.

As I write this I realize that it'd be best to store the tags / categories as a separate file from the artifact, which would facilitate backing up and maintaining the artifact / metadata easily (tar without any special parameters could copy and back-up the system trivially). I'm certain there are existing card catalog file standards to investigate.