To aid the presentation of the framework, we present a spreadsheet that stores information about a housing renting system, gathering information about clients, owners and rents. It also stores prices and dates of renting. The name of each column gives a clear idea of the information it represents. The 'totalDays' column is represented by a formulas which subtracting the column 'rentFinish' to 'rentStart'. The 'total rent' column is also represented by a formula that multiplies the total number of days of renting by the rent per day value.
The spreadsheet back-end generates a refactored spreadsheet in this format. Formulas of the original spreadsheet are migrated to the new one. For the above example the resulting spreadsheet is shown in the next figure.
Moreover an additional formula is generated for each table in order to ensure the uniqueness of its primary key. Thus the resulting spreadsheet will warn the user if he introduces in the spreadsheet an already existing value for the primary key (see figure below).
The SQL back-end generates SQL code which creates the database according to the derived RDB schema. The relational database schema (here represented by an entity-relation diagram) is as follows.
The returned RDB schema defines a 3NF database, consisting of four tables. Furthermore, it produces SQL code to insert the migrated data in the database corresponding to SQL 'insert' instructions with the migrated data as argument. Because some values of the spreadsheet are defined through formulas, we generate also SQL triggers, that models the spreadsheet formulas, which are used to update the database and guarantee its integrity. Next, we present the SQL code to create and populate the 'Client' table as well as one of the generated triggers induced by the two formulas of the example.
The complete SQL code to create the database and insert the migrated data can be obtained from this link.
Experimental Results We have used the HaExcel to refactor real spreadsheets taken from the large EUSES Spreadsheet Corpus. We have produced optimized (non redundant) and more modular spreadsheets obeying to the third normal form. Moreover we generate additional formulas which guarantee that each row in each table follows the 3NF, that is, there are no two rows with the same primary key.