René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback -
 

XPath

XPath is a query language for locating nodes and fragments in XML trees. It is very similar to XPointers.
XPath provides a common syntax for functionality shared by XPointer and XSLT.

Location paths

A location path is sequence of location steps. The location steps are seperated by a slash (/).

Location step

A location step looks like
axis::node-test[predicate]

axis

The axis selects a set of nodes that are candidates for the result.

node-test

The node-test examines the candidates and filters them based on node type (element, chardate etc) and names (eg element name, attribute name).

predicate

The predicate causes a further filtration.

Available axes

/

In an XPath expression / denotes the root of the document tree.
The slash is also used as a path separator to identify the children node of any given node. Consider the following document:
<a>
  <b><x>one</x></b>
  <c><x>two</x></c>
  <d><x>three</x></d>
</a>
Given this document, the following expression
/a/c/x
returns
<x>two</x>

child::

Returns the children of the context node.
Can be abbreviated by leaving child:: entirly out. That is, the child is the default axis.

descendant::

All descendants, not only children, but also the children's children and so on.

parent::

The parent, or null if document root.

ancestor::

The parent as well as the parent's parent, as well as the parent's parent's parent ....

following-sibling::

Brothers and sisters to the right.

preceding-siblings::

Brothers and sisters to the left.

following::

All following nodes in the document minus descendants.

preceding::

All previous nodes in the document minus ancestors.

attribute::

<a>
  <b x="1">one</b>
  <c x="2">two</c>
</a>'
Given the preceeding document, the following XPath expression will return 2:
/A/C/attribute::X
Given the same document, attribute:: can also be specified as predicate:
/a/*[attribute::x=2]
It will return:
<c x="2">two</c>
attribute:: can be abbreviated with the @ symbol.

namespace::

self::

Returns itself...

descendant-or-self::

Consider the following document:
<a>
  <x>one</x>
  <b><x>two</x></b>
  <c><d><x>three</x></d></c>
  <e><f><g><x>four</x></g></f></e>
  <e><f><g><x><!-- some comment --><z>five</z></x></g></f></e>
</a>
Given the document above, the following XPath expression
/a/descendant-or-self::x
returns
<x>one</x>
<x>two</x>
<x>three</x>
<x>four</x>
<x>
  <!-- some comment -->
  <z>five</z>
</x>

ancestor-or-self::

ancestors plus self.

Node Tests

text(), comment(), text() and text() test the node's type.

text()

Text nodes.

comment()

Comment nodes.

processing-instruction

node()

All nodes except attributes and namespace declarations.

name

Tests for the name.

*

The star is a wildcard that matches any, but exactly one, child node.
Consider the following XML document:
<a>
  <x>zero</x>
  <b><x>one</x></b>
  <c><y>two</y></c>
  <d><x>three</x></d>
  <e><f><x>four</x></f></e>
</a>
Then this XPath expression:
/a/*/x
returns
<x>one</x>
<x>three</x>
Note: neither zero nor four are returned although they're in an x tag.

Functions

last

position

A document full of things:
<things>
    <numbers><item>1</item><item>59</item></numbers>
    <animals><item>bird</item><item>cat</item><item>dog</item></animals>
  </things>
Now, let's find the 2nd animal:
//animals/item[position()=2]

count

id

localname

namespace-uri

name

string

concat

starts-with

contains

substring-before

substring-after

substring

string-length

normalize-space

translate

boolean

not

true

Returns true

false

Returns false

lang

number

sum

floor

ceiling

round

Abbreviations

nothing

@

The @ symbol is an abbreviation for attribute::. The following two expressions are equivalent:
/A/C/@X
/A/C/attribute::X

//

// is an abbreviation for descendant-or-self::.
See here for an example.

.

Location paths starting with a slash (/) begin execution at the root.

..

Examples

  • elem
    matches any element named elem.
  • *
    matches any element
  • elem_1|elem_2
    matches any elem_1 or elem_2
  • elem_parent/elem_child
    matches elem_child whose parent node is elem_parent
    Similarly, / matches the document root
  • elem_ancestor//elem
    matches any element named elem with an ancestor element named elem_ancestor
  • text()
    matches any text node.
  • processing-instruction()
    matches any processing instruction.
  • node()
    matches any node that is not an attribute node or root node.
  • id("foo")
    matches the element with unique ID foo
  • elem[1]
    matches any element named para that is the first elem child element of its parent
  • *[position()=1 and self::para]
    matches any para element that is the first child element of its parent
  • para[last()=1]
    matches any para element that is the only para child element of its parent
  • elem_parent/elem_child[position()>1]
    matches any element named elem_child that has a parent element named elem_parent and that is not the first item child of its parent
  • elem[position() mod 2 = 1]
    matches be any element named elem that is an odd-numbered item child of its parent.
  • div[@attr="appendix"]//p matches any p element within a div ancestor element that has an attribute named attr with value appendix
  • @attr matches any attribute named attr. It does not match elements that have an attribute named attr.
  • @* matches any attribute
  • id("foo")/child::para[position()=5]
    ???

Links

XPath visualizer written in JavaScript.