Table of Contents
| Acknowledgements | xxvi |
| Chapter 1 | Schemas: An introduction | 2 |
| 1.1 | What is an XML schema? | 3 |
| 1.2 | The purpose of schemas | 4 |
| 1.3 | Schema design | 6 |
| 1.4 | Schema languages | 8 |
| Chapter 2 | A quick tour of XML Schema | 16 |
| 2.1 | An example schema | 17 |
| 2.2 | The components of XML Schema | 18 |
| 2.3 | Elements and attributes | 20 |
| 2.4 | Data types | 21 |
| 2.5 | Simple types | 23 |
| 2.6 | Complex types | 25 |
| 2.7 | Namespaces and XML Schema | 28 |
| 2.8 | Schema composition | 30 |
| 2.9 | Instances and schemas | 30 |
| 2.10 | Annotations | 32 |
| 2.11 | Advanced features | 32 |
| Chapter 3 | Namespaces | 36 |
| 3.1 | Namespaces in XML | 37 |
| 3.2 | The relationship between namespaces and schemas | 47 |
| 3.3 | Using namespaces in XSDL | 49 |
| Chapter 4 | Schema composition | 56 |
| 4.1 | Modularizing schema documents | 57 |
| 4.2 | Defining schema documents | 59 |
| 4.3 | Schema assembly | 60 |
| 4.4 | Include, redefine, and import | 65 |
| Chapter 5 | Instances and schemas | 74 |
| 5.1 | Using the instance attributes | 75 |
| 5.2 | Schema processing | 77 |
| 5.3 | Relating instances to schemas | 79 |
| 5.4 | Using XSDL hints in the instance | 80 |
| 5.5 | Dereferencing namespaces | 83 |
| 5.6 | The root element | 84 |
| 5.7 | Using DTDs and schemas together | 87 |
| 5.8 | Using specific schema processors | 88 |
| Chapter 6 | Schema documentation and extension | 96 |
| 6.1 | The mechanics | 97 |
| 6.2 | User documentation | 104 |
| 6.3 | Application information | 108 |
| 6.4 | Notations | 111 |
| Chapter 7 | Element declarations | 118 |
| 7.1 | Global and local element declarations | 119 |
| 7.2 | Declaring the data types of elements | 126 |
| 7.3 | Default and fixed values | 128 |
| 7.4 | Nils and nillability | 132 |
| 7.5 | Qualified vs. unqualified forms | 137 |
| Chapter 8 | Attribute declarations | 140 |
| 8.1 | Global and local attribute declarations | 141 |
| 8.2 | Assigning types to attributes | 148 |
| 8.3 | Default and fixed values | 149 |
| 8.4 | Qualified vs. unqualified forms | 152 |
| Chapter 9 | Simple types | 154 |
| 9.1 | Simple type varieties | 155 |
| 9.2 | Simple type definitions | 157 |
| 9.3 | Simple type restrictions | 161 |
| 9.4 | Facets | 168 |
| 9.5 | Preventing simple type derivation | 177 |
| Chapter 10 | Regular expressions | 180 |
| 10.1 | The structure of a regular expression | 181 |
| 10.2 | Atoms | 183 |
| 10.3 | Quantifiers | 198 |
| Chapter 11 | Union and list types | 202 |
| 11.1 | Varieties and derivation types | 203 |
| 11.2 | Union types | 205 |
| 11.3 | List types | 209 |
| Chapter 12 | Built-in simple types | 220 |
| 12.1 | Built-in types | 221 |
| 12.2 | String-based types | 223 |
| 12.3 | Numeric types | 231 |
| 12.4 | Date and time types | 237 |
| 12.5 | Legacy types | 247 |
| 12.6 | Other types | 256 |
| 12.7 | Type equality | 262 |
| Chapter 13 | Complex types | 266 |
| 13.1 | What are complex types? | 267 |
| 13.2 | Defining complex types | 268 |
| 13.3 | Content types | 272 |
| 13.4 | Using element types | 275 |
| 13.5 | Using model groups | 283 |
| 13.6 | Using attributes | 293 |
| Chapter 14 | Deriving complex types | 300 |
| 14.1 | Why derive types? | 301 |
| 14.2 | Restriction and extension | 302 |
| 14.3 | Simple content and complex content | 303 |
| 14.4 | Complex type extensions | 305 |
| 14.5 | Complex type restrictions | 314 |
| 14.6 | Type substitution | 334 |
| 14.7 | Controlling type derivation and substitution | 335 |
| Chapter 15 | Reusable groups | 342 |
| 15.1 | Why reusable groups? | 343 |
| 15.2 | Named model groups | 344 |
| 15.3 | Attribute groups | 351 |
| 15.4 | Reusable groups vs. complex type derivations | 357 |
| Chapter 16 | Substitution groups | 360 |
| 16.1 | Why substitution groups? | 361 |
| 16.2 | The substitution group hierarchy | 362 |
| 16.3 | Declaring a substitution group | 363 |
| 16.4 | Type constraints for substitution groups | 365 |
| 16.5 | Alternatives to substitution groups | 367 |
| 16.6 | Controlling substitution groups | 371 |
| Chapter 17 | Identity constraints | 376 |
| 17.1 | Identity constraint categories | 377 |
| 17.2 | Design hint: Should I use ID/IDREF or key/keyref? | 378 |
| 17.3 | Structure of an identity constraint | 378 |
| 17.4 | Uniqueness constraints | 380 |
| 17.5 | Key constraints | 382 |
| 17.6 | Key references | 383 |
| 17.7 | Selectors and fields | 387 |
| 17.8 | The XML Schema XPath subset | 388 |
| 17.9 | Identity constraints and namespaces | 390 |
| Chapter 18 | Redefining schema components | 396 |
| 18.1 | Redefinition basics | 397 |
| 18.2 | The mechanics of redefinition | 400 |
| 18.3 | Redefining simple types | 401 |
| 18.4 | Redefining complex types | 402 |
| 18.5 | Redefining named model groups | 404 |
| 18.6 | Redefining attribute groups | 407 |
| Chapter 19 | Topics for DTD users | 412 |
| 19.1 | Element declarations | 413 |
| 19.2 | Attribute declarations | 420 |
| 19.3 | Notations | 423 |
| 19.4 | Parameter entities for reuse | 424 |
| 19.5 | Parameter entities for extensibility | 425 |
| 19.6 | External parameter entities | 431 |
| 19.7 | General entities | 433 |
| 19.8 | Comments | 434 |
| 19.9 | Using DTDs and schemas together | 436 |
| Chapter 20 | Naming considerations | 438 |
| 20.1 | Naming guidelines | 439 |
| 20.2 | Qualified vs. unqualified names | 444 |
| 20.3 | Structuring namespaces | 450 |
| 20.4 | Multiple languages | 460 |
| Chapter 21 | Extensibility and reuse | 464 |
| 21.1 | Reuse | 466 |
| 21.2 | Extending schemas | 467 |
| 21.3 | Versioning of schemas | 478 |
| 21.4 | Designing applications to support change | 482 |
| Appendix A | Table of XSDL keywords | 484 |
| A.1 | XSDL element types | 485 |
| A.2 | XSDL attributes | 494 |
| Appendix B | Built-in simple types | 504 |
| B.1 | Built-in simple types | 505 |
| Index | 511 |
Read an Excerpt
Chapter 9: Simple types
Both element and attribute declarations can use simple types
to describe the data content of the components. This chapter
introduces simple types, and explains how to define your
own atomic simple types for use in your schemas.
9.1 Simple type varieties
There are three varieties of simple type: atomic types, list types, and
union types.
- Atomic types have values that are indivisible, such as 10 and
large.
- List types have values that are whitespace-separated lists of
atomic values, such as <availableSizes>10 large
2</availableSizes>.
- Union types may have values that are either atomic values or list
values. What differentiates them is that the set of valid values, or "value space," for the type is the union of the value spaces of
two or more other simple types. For example, to represent a
dress size, you may define a union type that allows a value to
be either an integer from 2 through 18, or one of the string
values small, medium, or large.
List and union types are covered in Chapter 11, "Union and list
types."
9.1.1 Design hint: How much should I break down
my data values?
Data values should be broken down to the most atomic level possible.
This allows them to be processed in a variety of ways for different uses,
such as display, mathematical operations, and validation. It is much
easier to concatenate two data values back together than it is to split
them apart. In addition, more granular data is much easier to validate.
It is a fairly common practice to put a data value and its units in
the same element, for example <length>3cm</length>. How-ever,
the preferred approach is to have a separate data value,
preferably an attribute, for the units, for example <length
units="cm">3</length>.
Using a single concatenated value is limiting because:
- It is extremely cumbersome to validate. You have to apply a
complicated pattern that would need to change every time a
unit type is added.
- You cannot perform comparisons, conversions, or mathematical
operations on the data without splitting it apart.
- If you want to display the data item differently (for example, as
"3 centimeters" or "3 cm" or just "3", you have to split it apart.
This complicates the stylesheets and applications that process
the instance document.
It is possible to go too far, though. For example, you may break a
date down as follows:
<orderDate>
<year>2001</year>
<month>06</month>
<day>15</day>
</orderDate>
This is probably an overkill unless you have a special need to process
these items separately.
9.2 Simple type definitions
9.2.1 Named simple types
Simple types can be either named or anonymous. Named simple types
are always defined globally (i.e., their parent is always schema or
redefine) and are required to have a name that is unique among the
data types (both simple and complex) in the schema. The XSDL syntax
for a named simple type definition is shown in Table 9–1.
The name of a simple type must be an XML non-colonized name,
which means that it must start with a letter or underscore, and may
only contain letters, digits, underscores, hyphens, and periods. You
cannot include a namespace prefix when defining the type; it takes its
namespace from the target namespace of the schema document.
All of the examples of named types in this book have the word "Type"
at the end of their names, to clearly distinguish them from element-type
names and attribute names. However, this is not a requirement;
you may in fact have a data type definition and an element declaration
using the same name.
Example 9–1 shows the definition of a named simple type Dress-SizeType,
along with an element declaration that references it. Named
types can be used in multiple element and attribute declarations.
Example 9–1. Defining and referencing a named simple type
<xsd:simpleType name="DressSizeType">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="2"/>
<xsd:maxInclusive value="18"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="size" type="DressSizeType"/>
9.2.2 Anonymous simple types
Anonymous types, on the other hand, must not have names. They are
always defined entirely within an element or attribute declaration, and
may only be used once, by that declaration. Defining a type anonymously
prevents it from ever being restricted, used in a list or union, or redefined. The XSDL syntax to define an anonymous simple type is
shown in Table 9–2.
Example 9–2 shows the definition of an anonymous simple type
within an element declaration.
Example 9–2. Defining an anonymous simple type
<xsd:element name="size">
<xsd:simpleType>
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="2"/>
<xsd:maxInclusive value="18"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
9.2.3 Design hint: Should I use named or anonymous
types?
The advantage of named types is that they may be defined once and
used many times. For example, you may define a type named Product-CodeType
that lists all of the valid product codes in your organization.
This type can then be used in many element and attribute declarations
in many schemas. This has the advantages of:
- encouraging consistency throughout the organization,
- reducing the possibility of error,
- requiring less time to define new schemas,
- simplifying maintenance, because new product codes need only
be added in one place.
Named types can also make the schema more readable, when the
type definitions are complex.
An anonymous type, on the other hand, can be used only in the
element or attribute declaration that contains it. It can never be
redefined, have types derived from it, or be used in a list or union type.
This can seriously limit its reusability, extensibility, and ability to change
over time.
However, there are cases where anonymous types are preferable to
named types. If the type is unlikely to ever be reused, the advantages
listed above no longer apply. Also, there is such a thing as too much
reuse. For example, if an element can contain the values 1 through 10,
it does not make sense to try to define a data type named OneToTen-Type
that is reused by other unrelated element declarations with the
same value space. If the value space for one of the element declarations
that uses the named data type changes, but the other element declarations
do not change, it actually makes maintenance more difficult,
because a new data type needs to be defined at that time.
In addition, anonymous types can be more readable when they are
relatively simple. It is sometimes desirable to have the definition of the
data type right there with the element or attribute declaration....