About this course

What is this course about?

This course is an introduction to the theories, practices, and methods of digitizing legacy dictionaries for research, preservation and online distribution.

It focuses on a particular technique of modeling and describing lexical data using eXtensible Markup Language (XML) in accordance with the Guidelines of the Text Encoding Initiative, a de-facto standard for text encoding among humanities researchers.

While there are other ways to conceptualize, store, preserve, and disseminate lexical data (such as using relational structures and relational databases), this course will not cover those methods. Rather, in this course you will get a fundamental understanding of how to analyze, identify and describe dictionary data in order to query it or share it with others.

This course is divided into three main units:

  • in Unit 1: From Paper to Screen, you will learn about legacy dictionaries in general and what steps need to be taken to transform a paper dictionary into digital format
  • in Unit 2: Understanding and Modeling Lexical Data, you will delve deeper into the topic of dictionary structures and learn how to formally describe various constituent parts of a dictionary entry using eXtensible markup language (XML)
  • in Unit 3: TEI for Dictionaries, you will become familiar with best-practices for encoding dictionaries using the the Guidelines of the Text Encoding Initiative

There are no prerequisites for this course. You will, however, be asked to bring along a dictionary that you want to digitize and to use your own materials in exercise. We will be providing some exercise material, but you will learn much more if you venture out and work on your own data as well.

Who created this course?

Toma Tasovac is the director of the Belgrade Center or Digital Humanities and a self-proclaimed dictionary freak. In a previous life, Toma studied Russian Language and Literature at Harvard and Comparative Literature at Princeton, but then discovered his inner computer nerd. Today, Toma works on lexical resources, digital editions of literary text and broader issues related to digital research infrastructures


This course has been created in cooperation between #dariahTeach and the European Network for e-Lexicography's (ENeL) Working Group "Retrodigitizing Dictionaries".

Special thanks go out to all the participants of the workshop Toward Best-Practice Guidelines for Encoding Legacy Dictionaries from ENeL, DARIAH and PARTHENOS. Although the work on the best-practice guidelines has not been completed by the time this course has launched in its beta version, the main principles of this community-driven initiative have been adopted in the creation of this course and will be kept up to date with the developments in the Guidelines.

Lessons in Unit 2 that deal with markup languages, text modeling and basic rules of XML have been partially adapted from the #dariahTeach course on "Text Encoding and TEI" by Susan Schreibman and Roman Bleier.

Lesson 2 in Unit 1 has been adapted from Jesse de Does's post od DigiLex. The Grim case study has been adapted from Vera Hildenbrand's post on the same blog.

Last modified: Sunday, 17 September 2017, 7:15 AM