René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback -
 

Document type definition (DTD)

External DTD

SYSTEM

<!DOCTYPE root-element SYSTEM "http://xyz.qq/abc/def.dtd">
SYSTEM means that a validating parser needs the dtd when it validates an XML document.

PUBLIC

<!DOCTYPE root-element PUBLIC "path/abc/def.dtd">
PUBLIC means that a validating parser might use a 'cached' ???? version of the DTD.

Mixing

<!DOCTYPE root-element PUBLIC "path/abc/def.dtd"  "http://xyz.qq/abc/def.dtd">
This form uses http://xyz.qq.abc.def.dtd only if path/abc/def.dtd is not available.

The structure of a DTD

Essentially, a dtd contains ENTITY, ELEMENT, ATTLIST and NOTATION thingies(?)

<!ENTITY>

<!ENTITY foo "bar">
foo is used like
&foo;
<?xml version="1.0"?>
<!DOCTYPE bar 
[
  <!ENTITY foo "HERE COMES THE REPLACED STRING">
  <!ELEMENT bar ANY>
]>
<bar>&foo;
</bar>
External entity:
<?xml version="1.0"?>
<!DOCTYPE bar 
[
  <!ENTITY foo SYSTEM "c:\x.txt">
  <!ELEMENT bar ANY>
]>
<bar>&foo;
</bar>
&foo; will now be replaced with the contents of x.txt
Parameter entities: Parameter entities are identifiable by their percent sign instead of the ampersand (&).
<!ENTITY % warning "severe | error | fatal">
Parameter entities have DTD text, not XML text. It cannot appear within an XML document.
Parameter entity references may not be used within markup in an internal DTD
<?xml version="1.0" ?>
<!DOCTYPE r SYSTEM "file://c:/path/to/parameter_entity.dtd"
[
  <!ENTITY % warning "error | severe | fatal">
]>

<r>

   <severe>x</severe>
   <error> y</error>
   <fatal> z</fatal>

</r>
parameter_entity.dtd
<!ELEMENT r    ((%warning;)*)>

<!ELEMENT severe    (#PCDATA)>
<!ELEMENT error     (#PCDATA)>
<!ELEMENT fatal     (#PCDATA)>
As parser read the internal subset first %warning; is already defined when reading the dtd in the external subset.
Unparsed entities
An unparsed entity looks similar to an external entity with the NDATA keyword followed by notation identifier.
<!ENTITY foo SYSTEM "something.txt" NDATA txt>
NDATA (Notation Data) means that this ENTITY'S content is assigned to a certain notation type.
Unparsed entities can occur only in Attribute values that are declared to be of types ENTITY or ENTITIES.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ELEMENT document (sound)+>
  <!ELEMENT sound EMPTY>
  <!ATTLIST sound what ENTITY #REQUIRED >
  <!NOTATION WAV SYSTEM "/usr/local/bin/wave_player">
  <!ENTITY music SYSTEM "drop.wav" NDATA WAV>
  <!ENTITY wwwc "World Wide Web Corporation">
]>

<document>
  <sound what="music"/>
</document>

<!ELEMENT>

  • <!ELEMENT xyz EMPTY>
    An empty element contains neither text nor elements. But it might have attributes.
     
  • <!ELEMENT xyz ANY>
    <?xml version="1.0" ?>
    <!DOCTYPE r [
    
      <!ELEMENT r ANY >
      <!ELEMENT a ANY >
      <!ELEMENT b ANY >
      <!ELEMENT c (a*)>
      <!ELEMENT d (b*)>
    
    ]>
    <r>
    
      <a>
        <b>
          <a></a><a></a><b></b>
        </b>
        <c>
           <a>
             <b></b>
           </a>
        </c>
        <a>
          <a></a><b></b>
        </a>
      </a>
    </r>
    
  • <!ELEMENT xyz (abc|def|ghi)>

    The element xyz must contain the elements abc, def and ghi in exactly this order.
  • <!ELEMENT xyz (abc|def)>
    xyz must contain exaclty one of abc and def.
     
  • <!ELEMENT xyz (abc*)>
    xyz contains abc which is repeated any number of times, possibly zero times.
    So, <xyz></xyz>, <xyz><abc></abc></xyz>, or <xyz><abc></abc><abc></abc><abc></abc></xyz> would all be correct.
     
  • <!ELEMENT xyz (abc+)>
    The + is like the star (*), except that it must occur at least once.
     
  • <!ELEMENT xyz (abc?)>
    xyz optionally contains an abc element. If there is an abc element, it occurs only once.
     
  • <!ELEMENT xyz (#PCDATA)>
    xyz only contains text, no other elements.
  • <!ELEMENT xyz (#PCDATA|A)>

<!ATTLIST>

Attributes indicate properties of Elements.
  • <!ATTLIST elem att CDATA #REQUIRED>
    The element elem must have the attribute att.
     
  • <!ATTLIST elem att CDATA #IMPLIED>
    The attribute att is optional for the element elem.
     
  • <!ATTLIST elem att CDATA "def">
    The attribute att's default value is def. It can be overwritten.
     
  • <!ATTLIST elem att CDATA #FIXED "def">
    The attribute att's default value is def. It cannot be overwritten.
    That means: it can be either explicitely stated in the XML document or left out.
  • <!ATTLIST elem weekday (mo|tu|we|th|fr|sa|su) #REQUIRED >
    Weekday has an enumerated value domain.
<?xml version="1.0" ?>
<!DOCTYPE r
[
  <!ELEMENT r (a*)     >
  <!ELEMENT a (#PCDATA)>

  <!ATTLIST a a_1 CDATA                  #REQUIRED       >
  <!ATTLIST a a_2 CDATA                  #IMPLIED        >
  <!ATTLIST a a_3 CDATA                  "def"           >
  <!ATTLIST a a_4 CDATA                  #FIXED    "def" >
  <!ATTLIST a a_5 (enum_1|enum_2|enum_3) #REQUIRED       >
  <!ATTLIST a a_6 (enum_1|enum_2|enum_3) #IMPLIED        >
  <!ATTLIST a a_7 (enum_4|enum_5|enum_6) "enum_5"        >
]>
<r> <a a_5="enum_1" a_1="Y" />
</r>
Types of attributes:
  • CDATA
    CDATA is text (character data). The only limitation is that it must not contain any markup.
     
  • ID
    The value of such an attibute must be unique within the entire xml document in which it occurs.
    At most one attribute within an element can have the type ID.
    For hopefully obvious reasons, ID-Attributes cannot be default attributes or fixed attributes.
     
  • IDREF
     
  • IDREFS
     
  • ENTITY
    <!ATTLIST sales graph ENTITY #IMPLIED>
    <!ENTITY graph_sales SYSTEM "sales_jan_04.gif" NDATA gif>
    
    <!-- later in the document -->
    
    <sales graph="graph_sales">...
    </sales>
    
    The gif needs to be updated on one place only.
     
  • ENTITIES
     
  • NMTOKEN
    The value must be a valid XML name (=Name Token)
     
  • NOTATION The value of a notation is a sequence of name tokens (like NMTOKENS). This value matches a notation type.
    <?xml version="1.0"?>
    <!DOCTYPE d [
      <!ELEMENT d (picture)*>
      <!ELEMENT picture EMPTY>
      <!ATTLIST picture type NOTATION (gif|jpg) "gif">
      <!NOTATION gif SYSTEM "gifviewer.exe">
      <!NOTATION jpg SYSTEM "jpgviewer.exe">
    ]>
    
    <d>
      <picture type="gif"/>
    </d>
    

     
  • enum
    <!ATTLIST greetings word (hello|hallo|hi|ciao|salut) #IMPLIED>
Attribute values are subject to attribute-value normalization.
See also nesting entities.

<!NOTATION>

Declares a notation type. A notation type is an instruction that defines how to process (possibly binary) data.
<!NOTATION jpg SYSTEM "jpgviewer.exe">
<!NOTATION gif SYSTEM "gif.exe">
Here, jpg and gif, respectively, would be the notation types.

<![CDATA[ ]]>

Disadvantages of DTDs

  • Syntax not expressible in XML
  • No datatypes
  • Limited possibilites to express the cardinality for elements
  • Closed: DTDs cannot be mixed. DTS, except for entities, cannot be expanded.
  • No support for namespaces.
  • No support for inheritance
XML Schema tries to address some of these disadvantages.