ENCODE: 1.10 AI and ancient inscriptions: 1.10.1 The application of AI in Epigraphy: The example of some projects

1.10 AI and ancient inscriptions

This lesson illustrates individual projects based on the application of AI to the study of ancient epigraphy, providing an updated list of the most recent studies and projects. Critical aspects alongside best practices in the use of AI applied to this field of study will also be discussed.

1.10.1 The application of AI in Epigraphy: The example of some projects

Machine learning is the branch of Artificial Intelligence which teaches computers patterns and models by looking at examples from already provided datasets; these models are incorporated by computers and applied to other new datasets. In recent years, many projects in the field of ancient cultural heritage have made use of machine learning to accomplish different automated tasks such as translation of ancient texts, restoration of damaged texts, identification of ancient workshops and hands, attribution of ancient written artefacts to their original findspot, 3D representations of damaged archaeological sites etc. In the epigraphic field, in particular, some interesting tools can be mentioned:

Pythia (Thea Sommerschield, University of Oxford, Yannis Assael, DeepMind, Jonathan Prag, University of Oxford), an Ancient Greek text restoration model that recovers missing characters from a damaged text; the project will be implemented through a Marie Skłodowska-Curie postdoctoral fellowship (Thea Sommerschield, University of Oxford) by enlarging the dataset (PythiaPlus);

Ithaca (Thea Sommerschield, University of Oxford, Yannis Assael, DeepMind et al.), a deep neural network which restores ancient texts attributing them to their original place and time of writing;

Interview with Thea Sommerschield about Pythia and Ithaca

Fabricius (Google Arts and Culture), a machine learning tool which offers translations of Egyptian hieroglyphs into modern languages;
Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach (Vojtěch Kaše, Petra Heřmánková and Adéla Sobotková, SDAM project), a machine learning classification model which uses inscriptions categories from EDH to label inscriptions from EDCS in order to standardise categories of inscriptions to shared vocabularies (EAGLE);

AGILe (The First Lemmatizer for Ancient Greek Inscriptions) is an open-source software which applies Machine Learning for lemmatizing epigraphic texts developed by a team of the University of Groningen (Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey, Malvina Nissim). The model is trained through epigraphic data since they are very different from literary texts as characterised by many different local alphabets, a large dialectal variation and a lack of standardised spelling.

The AI historian: A new tool to decipher ancient texts

These tools are the product of projects which have involved the collaboration of several scholars and include more stages, such as the already mentioned Pythia and PythiaPlus. Another important example is the project Reconsidering the Roman workshop: examining the process behind the making of inscribed texts, funded by the Institute for Data Science and Artificial Intelligence (Charlotte Tupman and Jacqueline Christmas, University of Exeter), which at a first step used a text recognition software for detecting words in an image with the aim of getting information about patterns in the design and creation of epigraphic texts of the Roman world and possibly identifying the work of individual workshops. The same scholars are now undertaking a follow-up project, also in collaboration with the Alan Turing Institute and the University of Oxford to pursue a larger-scale analysis of letter-cutting practices. As a matter of fact, these projects require not only a great effort for the technical part but also much work to prepare huge datasets for software training.

Exercise

References

Assael, Y., Sommerschield, T., & Prag, J. (2019). Restoring ancient text using deep learning: A case study on Greek epigraphy (arXiv:1910.06262 [cs.CL]; Version 1). arXiv. https://doi.org/10.48550/ARXIV.1910.06262
Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., & de Freitas, N. (2022). Restoring and attributing ancient texts using deep neural networks. Nature, 603(7900), Article 7900. https://doi.org/10.1038/s41586-022-04448-z
Dencker, T., Klinkisch, P., Maul, S. M., & Ommer, B. (2020). Deep learning of cuneiform sign detection with weak supervision using transliteration alignment. PLOS ONE, 15(12), e0243039. https://doi.org/10.1371/journal.pone.0243039
Gordin, S., Gutherz, G., Elazary, A., Romach, A., Jiménez, E., Berant, J., & Cohen, Y. (2020). Reading Akkadian cuneiform using natural language processing. PLOS ONE, 15(10), e0240511. https://doi.org/10.1371/journal.pone.0240511.
de Graaf, E., Stopponi, S., Bos, J. K., Peels-Matthey, S., & Nissim, M. (2022). AGILe: The First Lemmatizer for Ancient Greek Inscriptions. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 5334–5344). European Language Resources Association. https://aclanthology.org/2022.lrec-1.571
Lazar, K., Saret, B., Yehudai, A., Horowitz, W., Wasserman, N., & Stanovsky, G. (2021). Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach (arXiv:2109.04513 [cs.CL]). arXiv. https://doi.org/10.48550/arXiv.2109.04513.
Panagopoulos, M., Papaodysseus, C., Rousopoulos, P., Dafi, D., & Tracy, S. (2009). Automatic Writer Identification of Ancient Greek Inscriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1404–1414. https://doi.org/10.1109/TPAMI.2008.201.

ENCODE Database Modules

Signal and Noise: Epigraphic Ventures in Machine Learning