http://zorba.io/modules/xml

Description

Before using any of the functions below please remember to import the module namespace:

import module namespace x = "http://zorba.io/modules/xml";

This module provides functions for reading XML files from string inputs. It allows reading of well-formed XML documents as well as well-formed external parsed entities, described by XML 1.0 Well-Formed Parsed Entities. The functions can also perform Schema and DTD validation of the input documents.

The following example parses a sequence of XML elements and returns them in a streaming fashion - each at a time:

 import module namespace x = "http://zorba.io/modules/xml";
 import schema namespace opt = "http://zorba.io/modules/xml-options";
 x:parse(
   "<from1>Jani</from1><from2>Jani</from2><from3>Jani</from3>",
   <opt:options>
     <opt:parse-external-parsed-entity/>
   </opt:options>
 )
 

Another useful option allows to skip an arbitrary number of levels before returning a sequence of nodes as shown in the following example:

 import module namespace x = "http://zorba.io/modules/xml";
 import schema namespace opt = "http://zorba.io/modules/xml-options";
 x:parse(
   "<root>
     <from1>Jani1</from1>
     <from2>Jani2</from2>
     <from3>Jani3</from3>
   </root>",
   <opt:options>
     <opt:parse-external-parsed-entity opt:skip-root-nodes="1"/>
   </opt:options>
 )
 

Module code

Here is the actual XQuery module code.

Imported Schemas

Please note that the schemas are not automatically imported in the modules that import this module.

In order to import and use the schemas, please add:

import schema namespace opt =  "http://zorba.io/modules/xml-options";

Imported modules

See also

Authors

Nicolae Brinza, Juan Zacarias

Version Declaration

xquery version "3.0" encoding "utf-8";

Namespaces

errhttp://www.w3.org/xqt-errors
opthttp://zorba.io/modules/xml-options
schemahttp://zorba.io/modules/schema
verhttp://zorba.io/options/versioning
xhttp://zorba.io/modules/xml
zerrhttp://zorba.io/errors

Function Summary

parse($xml-string as xs:string?, $options as element(opt:options)?) as node()* external

A function to parse XML files and fragments (i.

canonicalize($xml-string as xs:string) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

canonicalize($xml-string as xs:string, $options as element(opt:options)) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

canonicalize-impl($xml-string as xs:string, $options as element(*)) as xs:string external

Functions

parse#2

declare function x:parse(
    $xml-string as xs:string?,
    $options as element(opt:options)?
) as node()* external

A function to parse XML files and fragments (i.e. external general parsed entities).

The functions takes two arguments: the first one is the string to be parsed and the second argument is an <options/> element that passes a list of options to the parsing function. They are described below. The options element must conform to the xml-options:options element type from the xml-options.xsd schema. Some of these will be passed to the underlying library (LibXml2) and further documentation for them can be found at LibXml2 parser.

The list of available options:
  • <base-uri/> - the element must have a "value" attribute, which will provide the baseURI that will be used as the baseURI for every node returned by this function.
  • <no-error/> - if present, the option will disable fatal error processing. Any failure to parse or validate the input in the requested manner will result in the function returning an empty sequence and no error will raised.
  • <schema-validate/> - if present, it will request that the input string be Schema validated. The element accepts an attribute named "mode" which can have two values: "strict and "lax". Enabling the option will produce a result that is equivalent to processing the input with the option disabled, and then copying the result using the XQuery "validate strict|lax" expression. This option can not be used together with either the <DTD-validate/> or the <parse-external-parsed-entity/> option. Doing so will raise a zerr:ZXQD0003 error.
  • <DTD-validate/> - the option will enable the DTD-based validation. If this option is enabled and the input references a DTD, then the input must be a well-formed and DTD-valid XML document. The <DTD-load/> option must be used for external DTD files to be loaded. If the option is enabled and the input does not reference a DTD then the option is ignored. If the option is disabled, the input is not required to reference a DTD and if it does reference a DTD then the DTD is ignored for validation purposes. This option can not be used together with either the <schema-validate/> or the <parse-external-parsed-entity> option. Doing so will raise a zerr:ZXQD0003 error.
  • <DTD-load/> - if present, it will enable loading of external DTD files.
  • <default-DTD-attributes/> - if present, it will enable the default DTD attributes.
  • <parse-external-parsed-entity/> - if present, it will enable the processing of XML external entities. If the option is enabled, the input must conform to the syntax extParsedEnt (production [78] in XML 1.0, see Well-Formed Parsed Entities). In addition, by default a DOCTYPE declaration is allowed, as described by the [28] doctypedecl production, see Document Type Definition. A parameter is available to forbid the appearance of the DOCTYPE. The result of the function call is a list of nodes corresponding to the top-level components of the content of the external entity: that is, elements, processing instructions, comments, and text nodes. CDATA sections and character references are expanded, and adjacent characters are merged so the result contains no adjacent text nodes. If the option is disabled, the input must be a well-formed XML document conforming to the Document production (production [1] in XML 1.0). This option can not be used together with either the <schema-validate/> or the <DTD-validate/> option. Doing so will raise a zerr:ZXQD0003 error. The <parse-external-parsed-entity/> option has three parameters, given by attributes. The first attribute is "skip-root-nodes" and it can have a non-negative value. Specifying the paramter tells the parser to skip the given number of root nodes and return only their children. E.g. skip-root-nodes="1" is equivalent to parse-xml($xml-string)/node()/node() . skip-root-nodes="2" is equivalent to parse-xml($xml-string)/node()/node()/node() , etc. The second attribute is "skip-top-level-text-nodes" with a boolean value. Specifying "true" will tell the parser to skip top level text nodes, returning only the top level elements, comments, PIs, etc. This parameter works in combination with the "skip-root-nodes" paramter, thus top level text nodes are skipped after "skip-root-nodes" has been applied. The third paramter is "error-on-doctype" and will generate an error if a DOCTYPE declaration appears in the input, which by default is allowed.
  • <substitute-entities/> - if present, it will enable the XML entities substitutions.
  • <remove-redundant-ns/> - if present, the parser will remove redundant namespaces declarations.
  • <no-CDATA/> - if present, the parser will merge CDATA nodes as text nodes.
  • <xinclude-substitutions/> - if present, it will enable the XInclude substitutions.
  • <no-xinclude-nodes/> - if present, the parser will not generate XInclude START/END nodes.

An example that sets the base-uri of the parsed external entities:

   import module namespace x = "http://zorba.io/modules/xml";
   import schema namespace opt = "http://zorba.io/modules/xml-options";
   x:parse("<from1>Jani</from1><from2>Jani</from2><from3>Jani</from3>",
     <opt:options>
       <opt:base-uri opt:value="urn:test"/>
       <opt:parse-external-parsed-entity/>
     </opt:options>
   )
 

Parameters

  • $xml-string

    The string that holds the XML to be parsed. If empty, the function will return an empty sequence

  • $options

    The options for the parsing

Returns

  • node()*

    The parsed XML as a document node or a list of nodes, or an empty sequence.

Errors

  • zerr:ZXQD0003

    The error will be raised if the options to the function are inconsistent.

  • err:FODC0006

    The error will be raised if the input string is not a valid XML document or fragment (external general parsed entity) or if DTD validation was enabled and the document has not passed it.

  • err:XQDY0027

    The error will be raised if schema validation was enabled and the input document has not passed it or if the parsing options are not conformant to the xml-options.xsd schema.

Examples

canonicalize#1

declare function x:canonicalize(
    $xml-string as xs:string
) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

Note: This function is not streamable. If a streamable string is used as input for the function it will be materialized.

Note: This function sets the XML_PARSE_NOERROR option when parsing the XML input.

Parameters

  • $xml-string

    a string representation of a well formed XML to canonicalize. XML fragments are not allowed.

Returns

  • xs:string

    the canonicalized XML string.

Errors

  • err:CANO0001

    invalid input.

canonicalize#2

declare function x:canonicalize(
    $xml-string as xs:string,
    $options as element(opt:options)
) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

This version of the function allows specifying certain options to be used when initially parsing the XML string. These are of the same form as the options to x:parse#2(), although the following options are currently ignored for this function:

  • <opt:no-error/>
  • <opt:base-uri/>
  • <opt:schema-validate/>
  • <opt:parse-external-parsed-entity/>

Note: This function is not streamable, if a streamable string is used as input for the function it will be materialized.

Note: This function sets the XML_PARSE_NOERROR option when parsing the XML input.

Parameters

  • $xml-string

    a string representation of a well formed XML to canonicalize. XML fragments are not allowed.

  • $options

    an XML containg options for the canonicalize function.

Returns

  • xs:string

    the canonicalized XML string.

Errors

  • err:CANO0001

    invalid input.

canonicalize-impl#2

declare %:private function x:canonicalize-impl(
    $xml-string as xs:string,
    $options as element(*)
) as xs:string external

Returns

  • xs:string