Sqlization is not an English word (yet), it is however a symbol used by us to denote the process of turning unstructured but correlated data sets into a relational database against which SQL like queries can be performed. “Virtual” means that during the sqlization process, the underlying data are neither copied nor altered and the database tables are virtual (i.e., non-existing) so that the database instance is actually a virtual one. The system build the virtual database during startup time using the current snapshot of the file system as input and listen to the changes within at least part of the file system and update its content in response.
Here the data sets are unstructured in relational data sense, they could well have other structures which are not or cannot be directly used by the process to be described. For example the said data is a document with rich internal structure or the data is a webserver log entry that contains field items that can be categorized into different, but related virtual data tables.
Sqlization requires the creation of artificial relational database schema that for the target data system that can be mapped to the original data schema of the target system, depending on how the sqlization is expected. Most likely it could contain more information (potential or space) than the original one for extra data correlations to be build and recorded into the new database, making the sqlized system richer in content and information.
The said virtualization has many benefits. First, the target data sets could be very large and change constantly, like a user’s file system inside his/her operating system, it is not realistic to copy, transform them all into a concrete relational database instance and keep the two copy synchronized. Second, there are cases in which the sources of the data sets can be redirected, merged, or disconnected. Virtualization creates a layer of data indirection which can be programmed to redirect to, merge with, or disconnect from data sources either at configuration level or even during the run-time, making the system expandable.
It is obvious that sqlization of data sets exists the same time as the advent of relational database since data has to be structurally formatted and written into the database somehow. It is the way how they are written into the database that makes the difference. The current status of sqlization technology is too broad topic to be covered here since there are unlimited kinds of data systems. Let’s concentrate on the file system from now on.
There are many database file system on the market. They are file systems build on top of a relational database. The original data (files) are stored into a database in these systems so they are backed by relational database. Sqlization means something different. It means turning an existing data system into a relational database instance. There are a few of these kinds of products on the market.
There is little information on the virtual sqlization of file system or any other data system published at present.