Section 2) marked the frontiers at which we had to make breakthroughs in order to have our file system database successfully built to satisfy our expectations and design goals. Most of the breakthroughs can be applied to systems other than this particular instance
The file system database is a product that realizes the virtual sqlization of hierarchic file system of an operating system (OS). Windows OS is considered at present.
The system is designed to have five major layers which could contains their own sub-layers, as shown in Figure 2. The first layer on top is the data provider layer in which users defines their sqlization logic against their data source. The sqlization begins with an added relational data base schema that the user would like to use to describe the data source using relational database point of view.
The schema is fed into the production process shown in Figure 2 and Figure 4 to generate various layers enclosed by the gray ellipse. The top layer and bottom layers are produced by human software developer. The details of the database are hidden as much as possible using the entity abstraction layer so that a modification of the database schema definition will affect the manual part of the code as little as possible to allow an iterative development process in which the system is continuously improved or optimized.
There are 61 virtual tables for the present version of the system used to represent various relations between files, folders, disks, virtual folders (see below), folder and file annotations, file backup information and database, file secondary stream information, data type family (in the polymorphism sense), file data type correlation, user account and access security database (for accessing remote file system or data store), etc., supported by over one million lines of code.
File system contains various kinds of data, from system data to user personal data. It would not be practical to load all the data into the virtual database every time the system starts. A user is most likely to be interested in certain sets of data at a given time period, it is a waste to load all irrelevant data one time. The system treats any folder hierarchy as a database (for files and folders) to query and load the file meta-properties on demand which significantly increased the user experience since it loads faster and use less memory.
The operating system provides meta-properties of all files inside the corresponding file system. These meta-properties can be extracted and added into the virtual database to be queried, sorted against by the user. Although it gives the maximum common denominator for all files, it contains far too little information about the data contained inside a file system.
A file in a file system can contains any possible data types, most likely these data types contain further meta-properties and/or extended structure that can be parsed and added into the virtual database to be further queried and/or sorted against, giving a user much increased understanding of the data he/she has. Creating a virtual table for each one of these sub-types would create a maintenance nightmare, causing significant code redundancy (and thus potential for error), harder conceptual representation on the user interfaces, and other undesirable effects.
OO polymorphism can be used to represent the “IS A” relationship, which is made possible by using our technologies. By using OO polymorphism, a piece of data in a file is first of all a file, which means that a user can query the meta-properties of file supported by the operation system. When needed, a user can specialize to a certain type of data, say images, and query against those meta-properties that are special to images, like the pixel-size of the image, the author, the title, etc., of the image. Images have more sub-classification, for example, those images that are taken by a modern camera contains the time it was take, the place (GPS coordinate) it was take, the information about the camera, etc.. These further details can only be revealed by further specialization inside the query expression. Figure 5 shows the file family (classification) used by the current version of the system.
In addition to querying and sorting, specialization make it easier to bind corresponding handlers of the data contained inside the file, like previewer, editor, player, consumer, etc., which are different from one type of data to another.
Intelligence is built into various parts of the system and in multiple forms to improve usage experience and reduce possible errors. They are designed to be intelligent but not “smart”, namely they are there to provide guidance for the user but not there to make decisions on behalf of the same. Therefore all options are presented to the user at present. Smartness can be imposed at a higher layer, e.g. the bottom part of Figure 3 has some of it, however. The three most used ones are
Path intelligence. All visual hierarchic file system on a desktop computer has a folder tree view on left which can be used to go from one folder to another inside of the system. If the target folder tree is too deep or for some other reasons, like on a tablet or even a smart phone, a tree view could be un-available to begin with, the user could choose to use the breadcrumbs navigator on top of the file browser. In our system, it has two modes: 1) visual modes in which a user selects available sub-folder of a folder to select the path; 2) “text” mode in which a user input the path by typing a path in. The build in path “code generator” can be used to specify the path using just four keys, namely the right, up and down arrow keys and the back delete key to select or delete without actually typing anything. The node of breadcrumb of the present system can also be clicked to have only the sub-folder tree starting with the folder been clicked to be displayed on the folder tree view. For a deep folder, this can be very useful for cleaning up an otherwise messy user interface. This feature seems to be a unique one of our products, unseen in other similar products on the market, so far (2012-07-01).
For the entities in the file family (see Figure 4), there are short cuts on some column headers for the corresponding file list. The second way of query is simple to use: just open the query box on the corresponding header, select the operations and input the value to be queried. It is simplified because it does not allow building complex query expression and if more than one column query box have non-empty value, they are all “AND” jointed.
There is a text based query expression editor for very entity set inside the virtual database (namely the ones corresponding to the 61 virtual tables). The query expression editor can be used to build arbitrarily complex query expressions. The said editor is backed by a sub-SQL DSL generator that can guide the user to construct expressions without been too smart (namely make decisions for the user). Most of the input inside the query expression editor can be accomplished by using four keys, namely the right, up and down arrow keys and the back delete key to select or delete. The expression constructed can be saved and recalled later to re-used or edited.
The future version of the system will contain a graphic expression “designer” using which a user can use mouse (or touch finger) to drag, group and choose logic operators to construct equally complex filter predict expressions as in b).
Sorting intelligence. An important part of query a database is to control how the output list is to be ordered. The system provide a visual intelligent user interface using which any combination of sorting options can be performed. There are also simplified mode and complete mode to use. A subset of all sorting operations can be performed in the simplified mode using controls on selected column header.
Paging mechanism. Paging under an arbitrary combination of sorting specification and query filter expression is not easy to handle when it is required to load only a particular page is allowed at a time. The later requirement is important for querying a large data set because it is either un-practical or is a long process to wait to load the entire set into memory before the paging operation. A systematic solution to the problem does require a non-trivial design of the system.
The system is designed to allow keyword based search over the content of the file in addition to the query over meta-properties. They can be combined in an arbitrary expression in which one or more keyword based expressions (in Lucene syntax) can appear at any places inside the expression. Given the same keyword search engine , this ability enables a user to control what and how he/she would to find in a much finer way than the dumb one keyword search functionalities available now. The use of Lucense.Net search engine brings a user directly to the state of art (for small scale search) of it, and makes it better used.
This feature is not opened to public yet in the current version of the system however.
Virtual folders are folders created by a user that is independent of the existing folder hierarchy of the underlying OS. It transcends the OS file system to provide an alternative view of the files of the system and to provide a systematic rebuilding or restructuring operational means upon an old classification scheme that are out of data or is inappropriate in a new application context.
A file in an OS can belong to only one folder. It is especially true for Windows OS, where hard or soft links is not possible. Many users had the experiences that hard choices had to be made when assigning a file to a proper folder. A feature of the virtual folder system is that a file or a real folder can belong to multiple virtual folders, making the process of finding an item inside the virtual folder system much easier. It also eases the process of proper classification of an item that by nature has multiple categories.
The content of a virtual folder can be supplied in various ways, from manual assignment to fully automatic ones that are based on filter expressions.
The feature set contained in the virtual folder system of present version of the system is still quite small, it will be continuously enriched in future versions.
The system uses Microsoft Managed Extensibility Framework to allow post binding of many visual data type handlers, including email message handlers, image handers, audio/video handlers, etc. It also has interface based plugin slots that a developer can create arbitrary asynchronous batch file set processors.
The system has building support for accessing remote data repositories that can be mapped to a hierarchic file system in a variety of protocols with a sophisticated access and security domain management system. However, due to the incompleteness of the sub-system at present, it is not made available to public.