Introduction to DTDs

Creating Elements

In this unit you will learn to develop a DTD  for a simple address book. An address book usually contains several entries and each entry will contain information about a person including a name, address, and telephone number such as in the following table:

Name

Address

Telephone number

Clark Kent

344 Clinton Street Metropolis

55 50145

Bruce Wayne

1007 Mountain Drive Gotham

53 59333

 

 

 

 

As was mentioned previously, every XML document must have exactly one root element which encloses all other elements. The root element, like every other element, must be declared in the DTD. Root elements should be descriptive of the content. So, in our case, we will use the root element  <addressBook>. The following statement declares that <addressBook> is the root element:

<!DOCTYPE addressBook [
	<!ELEMENT addressBook (#PCDATA)>
]>



Between the square brackets (in blue) is the declaration of the root element. The DTD element declaration starts always with <! and ends with the ‘greater than symbol’, >.The definition within the round brackets means that the content of the element <addressBook> can only be of type "#PCDATA". PCDATA, or Parsed Character Data, is a data definition used in XML documents and basically means plain text with a few constraints. For instance, an important restriction of PCDATA is that characters such as ampersand angle brackets (< or >) and single and double quotes have to be escaped because these characters are used to distinguish mark-up from data.

The following video shows how a DTD should be placed at the beginning of an XML document and how a text editor such as oXygen can be used to evaluate if an XML is well-formed and valid.



Nesting Elements

<!DOCTYPE addressBook [
	<!ELEMENT addressBook (#PCDATA)>
]>


The above DTD statement we have written specifies an <addressBook> element that can only store character data such as text and numbers:

1
<addressBook> Clark Kent 344 Clinton Street Metropolis 55 50145 Bruce Wayne 1007 Mountain Drive Gotham 53 59333</addressBook>

Within the <addressBook> element currently only text is allowed. An address book has usually more address entries. In order to tell the computer where one address starts and another ends an element is needed to separate individual entries. Building on our previous example, we will add an <entry> element so that individual addresses can be separated:

<!DOCTYPE addressBook [
	<!ELEMENT addressBook (entry*)>
	<!ELEMENT entry (#PCDATA)>
]>


Not only did we declare a second element, we added additional constraints to the <addressBook> element.  The <addressBook> element cannot contain PCDATA (text and numbers) anymore.  Now it can only contain the element <entry>. Moreover, the asterix * after the element name specifies that ‘zero or more’ <entry> elements are contained within <addressBook>. Now each address entry can be encoded as an individual <entry> element as below:

1
2
3
4
<addressBook>
     <entry>Clark Kent 344 Clinton Street Metropolis 55 50145</entry>
     <entry>Bruce Wayne 1007 Mountain Drive Gotham 53 59333</entry>
</addressBook>

Within <entry> further elements could be nested such as elements for name or street or telephone number. How would you define the elements <name>, <street> and <phoneNr> with DTD? They should all be children of <entry> and contain only PCDATA content. Try it yourself!