importfromv3 - digiclear [CleaRWiki : The CleanRoom Wiki for C2N]

Importing data from DigiCleaR v3 to v4 (and sqlite to postgres)

Overall, the idea is to develop the code of import and export of data between different databases based on APIs exchanging files in JSON format. But also write routines for importing data from version 3 of the application currently in production to version 4 currently under development. Since the application is linked to machines, it will also be necessary to develop small local applications to carry out data recovery to be injected into the web application. For this, the environments are very disparate (in connection with the machines) and this will require using different programming languages (mainly PHP).

The import and export of digiclear 3 data to digiclear 4 as well as the migration of SQLite to PostgreSQL requires several operations that we will detail in this report. There are several issues that can arise with respect to these transactions, which will also be addressed.

For each import step, we present an image that summarizes the operation performed with texts and images to better see these operations.

First of all, the first types of data to import concerned those relating to users. Due to the design of existing databases between version 3 and 4 we chose to import the username first. The username must be in the following form: firstname.name (lowercase, no specific characters). The data already present in the database of version 3 made us think and we decided to create a script that allows :

- to extract usernames

- transform these by respecting the chosen convention (that is to say firstname.name)

- modify or even delete unwanted ones (duplicate, username duplicated with only a letter that changes (like stehpane instead of stephane …) …)

- export all usernames in the associated database

The script is designed to respect the dynamic aspect that we wants to give to the application. Everything is automatic, when the script is started, a connection to SQLite is made, a data extraction / transformation is applied and finally these are exported to PostgreSQL. However, you should know that those data are imported from the usernames present in the “data” table and not in the “user” table. Most of the usernames are populated in this table, and almost all usernames that did not conform to the convention have already been changed.

                                           Figure 1 : Image representing the import of users

The next step is the sample operations. After reflection we thought that it was necessary to make them inactive. For the most part they are already set aside, but it is of course possible to find them if they need them. Moreover, in the same way as the users, a verification and modification operation is also applied on the samples in order to delete the duplicates. We thus remove the samples that have the same name and the samples that are repeated several times with just a different character (for example Test1, Test2 …).

                                           Figure 2 : Image representing the import of samples

We also detailed a topic that remains quite sensitive for all users (and maintainers): confidentiality. We told ourselves that a user could read the process sheets only if he added the sample in question to the project. The fact of not being able to add a sample if we are not related to the project is not necessarily true, when we do our operations sheet, according to the machine where we are we can manipulate one for two. One can select one's own project on which one has added himself and one will be able to select its samples ; projects and samples are independent, samples do not depend on projects (the converse is true). However, if we have a confidential sample and a confidential project on an operations sheet, if we want to see it we would have to be added to the project so that we have the right to see everything which is confidential in relation to the project.

Once the users and the samples have been correctly inserted, it is necessary to import the access rights. For each operation, we will retrieve the username and the sample name (which will allow us to have the identifier of the user(s)and the sample(s) identifier(s). We will insert in the rights table (sample_access) the identifier of username(s), the identifier of sample name(s) and the right assigned.

                                        Figure 3 : Image representing the import of sample access

From that moment, all users, samples, and user and sample rights were imported. The second big part of the process of importing data from version 3 to version 4 is directly related to operations.

Before importing the operations it is necessary to make the correspondences between the machines. The version 4 database already contained all the machines but some had to be renamed according to the name of the machine already defined in the version 3 database. Thus, a csv file was created to compare machine names present between the 2 versions and a correspondence has been established between them. For example, the machine originally named ICP-RIE SENTECH SI500 was renamed with the name assigned to that machine in the version 3 database, namely ICPSentech1.

                                  Figure 4 : Image representing the correspondences between the machines

The data table in the version 3 database contains all the data related to the operations, which create problems of use and efficiency. It was therefore necessary to divide the information into tables based on the object (here operations). For this reason, the information related to operations is exported to 3 tables: operations, operation_data, and operation_samples. The operations table contains the main information of the operation : the identifier of the machine and user, the type of the operation (maintenance …), the date and the total time.

                                        Figure 5 : Image representing the import of process

Then, the operations_data table will contain all the information of an operation. In the version 3 database, an operation or data sheet is represented by an identifier. That is, all data that has this identifier will represent an operation. Thus for each identifier present in the operations table, we will insert all the information relating to this identifier in the operation_data table. Finally, the operation_samples table will contain the identifiers of the operations as well as the identifiers of the projects and samplenames of these operations. You must know that a project name identification operation is performed because several projects had the same names or were very similar. Duplicate projects are removed and we take care to keep the correct identifiers.

When the operations are imported and exported to the 3 corresponding tables, the import of data from DigiCleaR v3 to v4 (and sqlite to postgres) is done.