ENCODE: 3.4 Summary and best practices: Summary and best practices

3.4 Summary and best practices

This lesson is a recap of essential database concepts and highlight best practices. It emphasizes efficient database management techniques and the importance of leveraging digital tools effectively. By distilling key insights, it provide actionable guidelines to enhance database performance and maintain data integrity in various contexts.

Summary and best practices

Most digital resources for studying ancient writing cultures are built on databases and these are most often either relational databases or XML databases, while also other database models exist.

While XML databases are essentially built by editing and uploading XML documents (which are their core), relational databases require the use of an interface (or of command line tools) to build the structure of the database (its tables) and populate it (enter the data). Building a database structure requires modelling the data, be it in the form of tables and relationships or in the form of hierarchical XML annotation.

Databases, once they have been created, can be accessed through a command line interface (terminal) or various kinds of graphical interfaces. Most users, including researchers, will access and search databases through dedicated online graphical user interfaces (GUI). Such interfaces (dynamic websites) are relatively complex systems that require a good amount of programming knowledge. Further, they also need to be designed to be functional and to fit different users' needs. Therefore, GUIs are best thought of as a collaborative creation involving researchers, programmers and designers and the necessary amount of funding!

Interfaces, as digital platforms in general are not easy to maintain and are prone to become obsolete and ultimately become inaccessible. Because of this, projects should always have a plan for the long-time preservation of their data, in formats (such as plain text, XML or HTML) which are easy to store in public and institutional repositories and which are considered to be less prone to obsolescence. Documentation of the structure of the database, where tables and fields (for relational databases) or tags (for XML databases) are explicitly described in a way that can be understood by people external to the project, should also be part of the long-term storing of a database. This will make their work also easier to be shared, disseminated and reused, making it compliant with the FAIR principles (Findability, Accessibility, Interoperability, and Reuse of digital assets).

Finally, since not all possible research questions and queries can be anticipated and programmed into the research possibilities of an online graphical interface, it can be advantageous for researchers and students to learn how to search a database directly through the native query language of their database: SQL for most relational databases and XQuery/XPath for XML databases. This will allow them also to perform advanced manipulation (update, change, add) of the data in the database. Finally, off-the-shelf visualisation and exploration tools can conveniently provide additional ways to browse and search the data.

A proposal of best practice for building a (Humanities) database

1. Preliminary planning questions: what do we need the database for? For which research question(s)? For which users? Where will it be stored? Will it be published? In which form? Will there be an online GUI? Do I have the necessary expertise/fundings to build one? Where will the GUI be stored? What is the long-term storage plan?

2. Choosing the database model where to put your data. The main options in the field of epigraphy and papyrology, at the moment, are relational databases and XML databases. The choice of database model should be guided by the kind of data, the purpose of its storing, the concrete availability of tools and expertise, and the needs of present and future users.

3. Data modelling. As no reality can be fully represented by a model, data modelling is about choosing the right aspects of the reality to be included in the model, where right means functional to the needs of users and to the research questions of a given project.

4. Building the database structure. In the case of relational databases, this means also choosing an interface (GUI or command line) with which to build the database structure. This process can overlap with data modelling, as part of data modelling can be a process of try and error and prototyping. The structure should be documented.

5. Populating the database (entering the data). Again, in the case of relational databases, this means also choosing an interface (GUI or command line) with which to enter the data. This, depending on the needs and the expertise of the users, can be the same GUI used for building the structure, an off-the-shelf product, or a bespoke GUI.

6. Building a graphic user interface. This implies defining the target users and their needs (UX design), building (programming) the necessary infrastructure and planning who will host it and for how long.

7. Using the database, through the online interface, visualisation tools or directly through a query language, according to needs and expertise.