Content area
Full Text
Introduction
Extensible Markup Language (XML) is a simplified subset of the Standard Generalized Markup Language (SGML), which was created in the mid-1990s for specifying documents that can be exchanged and automatically processed by machines ([6], [7] Bray et al. , 2006a, b). One of the advantage of XML as a data exchange format is the standardization of validation technology, which is good for both parties who exchange data using XML on the web. As a result, XML is fast emerging as the dominant standard for data representation and exchange over the internet.
Document type definition (DTD) is the first and widely used schema language built into XML ([6], [7] Bray et al. , 2006a, b). Its technical underpinnings come from the theory of formal languages, and general-purpose parsers that can validate any document against any DTD are well known ([2] Bernstein et al. , 2005). It uses a set of rules to define a schema, which is very concise and easy to read. However, its support for schema structure is minimal and has a rather restricted expressive power ([17] Lee and Chu, 2002).
With the fast development, XML has introduced the possibility of treating web documents as data sources that can be queried (as with database relations) and that can be related to each other through semantically meaningful links (as with foreign-key constraints). It was at this point that XML began to outgrow its SGML heritage. One of the first enhancements is XML namespaces ([6], [7] Bray et al. , 2006a, b). Another important enhancement is the development of the XML schema specification ([8] Brown et al. , 2001), which is designed to rectify many of limitation of DTD as a data definition language ([2] Bernstein et al. , 2005). These limitations include the following:
- DTD does not support namespaces.
- The syntax of DTD is quite different from that of XML documents so we cannot use standard XML tools to manipulate DTD schemas.
- DTD only supports a few built-in types such as CDATA, PCDATA and does not support user-defined types and cannot constrain character data.
- DTD provides only limited means for expressing data-consistency constraints. It does not have keys (except for the limited ID type), and the mechanism for specifying referential...