SINNO stands for Significant INNOvations. It is a relational database that I am developing for the SWINNO project which is about significant Swedish innovations. The project’s webpage introduces to the database’s content and what ‘significant’ means. In these pages, I explain about the database’s functionality and design. Also, I would like to share some experiences about what it is like to develop a database in a real-world situation where textbook ideals go to the waste bin at day one.
Functions and features
The SINNO database is a relational database to support academic research into innovations. From the beginning, it was designed to collect data on innovations and to control and improve data quality more than anything else. By contrast, it is for example not meant to search for innovations or to analyze the data, at least not for the time being. Very few people are interested in searching for data on individual innovations and most researchers in the field of innovation studies are not used to analyzing data within relational databases. Instead, for research purposes, the database delivers data in the form of tables which are analyzed with spreadsheet software or statistical software. Typically, SINNO’s predecessor was a spreadsheet.
Collection managers defining ‘variables’ for academic research
Within SINNO so-called ‘collection managers’ are data managers who have responsibility for their respective sub-sets of the data. The database has an important feature, which is that it allows them to define how the data is organized and what data-quality checks apply.
Here, a brief introduction into what data means for the users of the SINNO database is necessary. They work in the discipline of economic history, which mostly works with quantitative data. This data is defined in terms of ‘variables’. Such a variable could for example be the year that an innovation was put on the market, or a code that describes the status of an innovation: is it on the market, still under development, finished but not yet for sale, or other. Sometimes a variable is a composite of several variables. It sounds funny to a programmer but that is how they talk about it. For example, the SWINNO data includes a variable called ‘collaborating actor’ which has a name, a country and a type.
A variable can also have multiple values of the same type. For example, an innovation can have multiple inventors. Finally, variables can both consist of multiple variables and have multiple values. In the SWINNO database, this is the case for the collaborating-actor variable since a company can of course collaborate with multiple other companies or organizations for the same innovation.
Back to SINNO’s feature: through a range of settings, the collection manager can define the variables as necessary. They can give them a label, a short description that shows up as a tool-tip, and a longer description for the data-editors and researchers who want to know exactly how a variable is defined. The collection managers can choose what kind of data the variable contains: a text, a number, a year, a date, or a code. In case of a code they can make a list of codes and set some attributes for them including short and elaborate descriptions of what the codes mean and how they should be used.
The collection managers can also define certain quick checks on the variable. For example they can determine that the variable may not be empty or can only be within a specific range. For variables with multiple values they can also set how many values must or may be entered.
Collection managers can choose to set logging on or off per field. Logging here means that every change made to the field’s content will be listed in the log table, together with the moment of the change and – for privacy reasons – the three-letter alias of the editor. Such logging is considered important for later quality control. It allows to see a field’s or record’s edit history. When there is a suspicion that a particular data editor entered data into a particular field in the wrong way – they may have misunderstood the description for example – the editor’s edits for that field for all records can be retraced and reviewed.
The result of the collection managers’ work is part of how data-editors access and edit data. The following sub-sections give the highlights of the interface for data editing.
Reminder of basic inclusion rules
The SWINNO data is manually collected from professional and trade journals. To be included an innovation has to comply to three criteria. Before adding an innovation, data editors are reminded about the three criteria with an overview screen.
Please note that the screen-shot is not exactly reflecting the currently-used set of criteria.
Check if an item has not already been entered
A problem for many, if not all, databases is the occurrence of doubles. In SWINNO multiple data-editors work on the data and none of them have worked on from the beginning to the end. This means that none have a detailed overview of which innovations have already been entered. Before adding a new innovation, the interface invites data-editors to first search for the innovation in the already existing data.
The two (up to four) columns on the left show search results for individual criteria. The right column shows the result that complies to all criteria at the same time.
An icon to the right of every field indicates the error-status of the field. From top to bottom: All okay, Error found or no checks specified. Not showing: * obligatory field.
If errors are found, then they are listed at the bottom of the screen.
The descriptions entered by the collection managers show at the bottom left corner of the screen in a small text box. The user has the option to see the box as a pop-up with a bigger size and with additional information in case the field has a coding system.
In the example of the product code in the previous section, the variable has an elaborate coding system which contains hundreds of codes, each of which may be accompanied by elaborate descriptions of what a code covers and what not. For such elaborate lists, a drop-dop list is not very helpful. Instead SINNO offers a code selector (also as a pop-up) that allows users to search, browse and select codes.
Progress and quality monitoring
The work of data managers is supported with graphs to monitor data editing progress and the types of sources that support the data.