Table of Contents
| Ch. 1 | XPath 2.0 in context | 1 |
| Ch. 2 | The data model | 27 |
| Ch. 3 | The type system | 61 |
| Ch. 4 | The evaluation context | 115 |
| Ch. 5 | Basic constructs | 133 |
| Ch. 6 | Operators on items | 169 |
| Ch. 7 | Path expressions | 201 |
| Ch. 8 | Sequence expressions | 239 |
| Ch. 9 | Type expressions | 261 |
| Ch. 10 | XPath functions | 291 |
| Ch. 11 | Regular expressions | 447 |
| App. A | XPath 2.0 syntax summary | 459 |
| App. B | Operator precedence | 467 |
| App. C | Compatibility with XPath 1.0 | 469 |
| App. D | Error codes | 475 |
Read a Sample Chapter
XPath 2.0 Programmer's Reference
By Michael Kay John Wiley & Sons
ISBN: 0-7645-6910-4
Chapter One
Sequence Expressions One of the most notable innovations in XPath 2.0 is the ability to construct and manipulate sequences. This chapter is devoted to an explanation of the constructs in the language that help achieve this.
Sequences can consist either of nodes, or of atomic values, or of a mixture of the two. Sequences containing nodes only are a generalization of the node-sets offered by XPath 1.0. In the previous chapter we looked at the operators for manipulating node-sets, in particular, path expressions, and the operators «union», «intersect», and «except».
In this chapter we look at constructs that can manipulate any sequence, whether it contains nodes, atomic values, or both. Specifically, the chapter covers the following constructs:
Sequence concatenation operator: «,»
Numeric range operator: «to»
Filter expressions: «a[b]»
Mapping expressions: «for»
Quantified expressions: «some» and «every»
First, some general remarks about sequences.
Sequences (unlike nodes) do not have any concept of identity. Given two values that are both sequences, you can ask (in various ways) whether they have the same contents, but you cannot ask whether they are the same sequence.
Sequences are immutable. This is part of what it means for a language to be free of side effects. You can write expressions that take sequences as input and produce new sequences as output, but you can never modify an existing sequence in place.
Sequences cannot be nested. If you want to construct trees, build them as XML trees using nodes rather than atomic values.
A single item is a sequence of length one, so any operation that applies to sequences also applies to single items.
Sequences do not have any kind of type label that is separate from the type labels attached to the items in the sequence. As we will see in Chapter 9, you can ask whether a sequence is an instance of a particular sequence type, but the question can be answered simply by looking at the number of items in the sequence, and at the type labels attached to each item. It follows that there is no such thing as (for example) an "empty sequence of integers" as distinct from an "empty sequence of strings". If the sequence has no items in it, then it also carries no type label. This has some real practical consequences, for example, the sum() function, when applied to an expression that can only ever return a sequence of xs:duration values, will return the integer 0 (not the zero-length duration) when the sequence is empty, because there is no way at runtime of knowing that if the sequence hadn't been empty, its items would have been durations.
Functions and operators that attach position numbers to the items in a sequence always identify the first item as number 1 (one), not zero. (Although programming with a base of zero tends to be more convenient, Joe Public has not yet been educated into thinking of the first paragraph in a chapter as paragraph zero, and the numbering convention was chosen with this in mind.)
This chapter covers the language constructs that handle general sequences, but there are also a number of useful functions available for manipulating sequences, and these are described in Chapter 10. Relevant functions include: count(), deep-equal(), distinct-values(), empty(), exists(), index-of(), insert-before(), remove(), subsequence(), and unordered().
The Comma Operator
The comma operator can be used to construct a sequence by concatenating items or sequences. We already saw the syntax in Chapter 5, because it appears right at the top level of the XPath grammar:
Expression Syntax
Expr ExprSingle («,» ExprSingle) *
ExprSingle ForExpr | QuantifiedExpr | IfExpr | OrExpr
Although the production rule ExprSingle lists four specific kinds of expression that can appear as an operand of the «,» operator, these actually cover any XPath expression whatsoever, provided it does not contain a top-level «,».
Because the «,» symbol also has other uses in XPath (for example, it is used to separate the arguments in a function call, and also to separate clauses in «for», «some», and «every» expressions, which we will meet later in this chapter), there are many places in the grammar where use of a general Expr is restricted, and only an ExprSingle is allowed. In fact, the only places where a general Expr (one that contains a top-level comma) is allowed are:
As the top-level XPath expression Within a parenthesized expression
Within the parentheses of an «if» expression
Within square brackets as a predicate
Neither of the last two is particularly useful, so in practice the rule is: if you want to use the comma operator to construct a list, then it must either be at the outermost level of the XPath expression, or it must be written in parentheses.
For example, the max() function expects a single argument, which is a sequence. If you want to find the maximum of three values $a, $b, and $c, you can write:
max(($a, $b, $c))
The outer parentheses are part of the function call syntax; the inner parentheses are needed because the expression «max($a, $b, $c)» would be a function call with three parameters rather than one, which would be an error.
XPath does not use the JavaScript convention whereby a function call with three separate parameters is the same as a function call whose single parameter is a sequence containing three items.
The operands of the «,» operator can be any two sequences. Of course, a single item is itself a sequence, so the operands can also be single items. Either of the sequences can be empty, in which case the result of the expression is the value of the other operand.
The comma operator is often used to construct a list, as in:
if ($status = ('current', 'pending', 'deleted', 'closed')) then ...
which tests whether the variable $status has one of the given four values (recall from Chapter 6 that the «=» operator compares each item in the sequence on the left with each item in the sequence on the right, and returns true if any of these pairs match). In this construct, you probably aren't thinking of «,» as being a binary operator that combines two operands to produce a result, but that's technically what it is. The expression «A,B,C,D» technically means «(((A,B),C),D)», but since list concatenation is associative, you don't need to think of it this way.
The order of the items in the two sequences is retained in the result. This is true even if the operands are nodes: there is no sorting into document order. This means that in XSLT, for example, you can use a construct such as:
to process the selected elements in a specified order, regardless of the order in which they appear in the source document. This example is not necessarily processing exactly three elements: there might, for example, be five authors and no abstract. Since the path expression «author» selects the five authors in document order, they will be processed in this order, but they will be processed after the
element whether they precede or follow the title in the source document. Examples
Here are some examples of expressions that make use of the «,» operator to construct sequences.
Expression Effect max(($net, $gross)) Selects whichever of $net and $gross is larger, comparing them according to their actual data type (and using the default collation if they are strings)
for $i in (1 to 4, 8, 13) 'Selects the items at positions 1, 2, 3, 4, 8, and 13 of the return $seq[$i] sequence $seq. For the meaning of the «to» operator, see the next section
string-join((@a, @b, Creates a string containing the values of the attributes @a, @c), "-") @b, and @c of the context node (in that order), separated by hyphens
(@code, "N/A")[1] Returns the code attribute of the context node if it has such an attribute, or the string "N/A" otherwise. This expression makes use of the fact that when the code attribute is absent, the value of @code is an empty sequence, and concatenating an empty sequence with another sequence returns the other sequence (in this case the singleton string "N/A") unchanged. The predicate in square brackets makes this a filter expression: filter expressions are described later in this chapter, on page 244
book/(author, title, Returns a sequence containing the , , isbn) and children of a element, in document order. Although the «,» operator retains the order as specified, the «/» operator causes the nodes to be sorted into document order. So in this case the «,» operator is exactly equivalent to the union operator «|» Numeric Ranges: The «to» Operator
A range expression has the syntax:
Expression Syntax
RangeExpr AdditiveExpr ( «to» AdditiveExpr )?
The effect is to return a sequence of consecutive integers in ascending order. For example, the expression «1 to 5» returns the sequence «1,2,3,4,5».
The operands do not have to be constants, of course. A common idiom is to use an expression such as «1 to count($seq)» to return the position number of each item in the sequence $seq. If the second operand is less than the first (which it will be in this example if $seq is an empty sequence), then the range expression returns an empty sequence. If the second operand is equal to the first, the expression returns a single integer, equal to the value of the first operand.
The two operands must both evaluate to single integers. You can use an untyped value provided it is capable of being converted to an integer: for example you can write «1 to @width» if width is an attribute in a schema-less document containing the value «34». However, you can't use a decimal or a double value without converting it explicitly to an integer. If you write «1 to @width+1», you will get a type error, because the value of «@width+1» is the double value 35.0e0. Instead, write «1 to xs:integer(@width)+1». or «1 to 1 + @width idiv 1».
It's an error if either operand is an empty sequence. For example, this would happen if you ran any of the examples above when the context node did not have a width attribute. Supplying a sequence that contains more than one item is also an error.
If you want a sequence of integers in reverse order, you can use the reverse() function described in Chapter 10. For example, «reverse(1 to 5)» gives you the sequence «5,4,3,2,1». In an earlier draft of the specification you could achieve this by writing «5 to 1», but the rules were changed because this caused anomalies for the common usage «1 to count($seq)» in the case where $seq is empty.
Although the semantics of this operator are expressed in terms of constructing a sequence, a respectable implementation will evaluate the sequence lazily, which means that when you write «1 to 1000000» it won't actually allocate space in memory to hold a million integers. Depending how you actually use the range expression, in most cases an implementation will be able to iterate over the values one to a million without actually laying them out end-to-end as a list in memory.
Examples
Here are some examples of expressions that make use of the «to» operator to construct sequences.
Expression Effect for $n in 1 to 10 return Returns the first 10 items of the sequence $seq. The «for» $seq[n] expression is described later in this chapter, on page 247 $seq[position() = 1 to 10] Returns the first 10 items of the sequence $seq. This achieves the same effect as the previous example, but this time using a filter expression alone. It works because the «=» operator compares each item in the first operand (there is only one, the value of position()), with each item in the second operand (that is, each of the integers 1 to 10), and returns true if any of them matches. It's reasonable to expect that XPath processors will optimize this construct so that this doesn't actually involve 10 separate comparisons for each item in the sequence. Note that you can't simply write «$seq[1 to 10]». If the predicate isn't a single number, it is evaluated as a boolean, and the effective boolean value of the sequence «1 to 10» is true, so all the items will be selected
string-join( Returns a string containing $N space characters for $i in 1 to $N return " ", "")
for $i in 1 to Returns a sequence that contains pairs of corresponding values count($S) return from the two input sequences $S and $T. For example, if $S is ($S[$i], $T[$i]) the sequence ("a","b","c") and $T is the sequence ("x","y","z"), the result will be the sequence ("a","x","b","y","c","z")
Filter Expressions
A filter expression is used to apply one or more Predicates to a sequence, selecting those items in the sequence that satisfy some condition.
Expression Syntax FilterExpr PrimaryExpr Predicate*
Predicate «[» Expr «]»
A FilterExpr consists of a PrimaryExpr whose value is a sequence, followed by zero or more Predicates that select a subset of the items in the sequence. Each predicate consists of an expression enclosed in square brackets, for example «[@name='London']» or «[position()=1]».
The way the syntax is defined, every PrimaryExpr is also a trivial FilterExpr, including simple expressions such as «23», «'Washington'», and «true()».
Since in XPath 2.0
Continues...
Excerpted from XPath 2.0 Programmer's Reference by Michael Kay Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.