http://www.zorba-xquery.com/modules/converters/html

Description

Before using any of the functions below please remember to import the module namespace:

import module namespace html = "http://www.zorba-xquery.com/modules/converters/html";

This module provides functions to tidy a HTML document.
The functions in this module take an HTML document (a string) as parameter, tidy it in order to result in valid XHTML, and return this XHTML document as a document-node.

Module code

Here is the actual XQuery module code.

Imported Schemas

Please note that the schemas are not automatically imported in the modules that import this module.

In order to import and use the schemas, please add:

import schema namespace html-options =  "http://www.zorba-xquery.com/modules/converters/html-options";

Imported modules

Authors

Sorin Nasoi

Version Declaration

xquery version "3.0" encoding "utf-8";

Namespaces

errhttp://ww.w3.org/2005/xqt-errors
htmlhttp://www.zorba-xquery.com/modules/converters/html
html-optionshttp://www.zorba-xquery.com/modules/converters/html-options
schemahttp://zorba.io/modules/schema
verhttp://zorba.io/options/versioning

Function Summary

parse($html as xs:string) as document()

This function tidies the given HTML string and returns a valid XHTML document node.

parse($html as xs:string, $options as element(html-options:options)) as document()

This function tidies the given HTML string and returns a valid XHTML document node.

parse-internal($html as xs:string, $options as element(html-options:options)) as document() external

Functions

parse#1

declare function html:parse(
    $html as xs:string
) as document()

This function tidies the given HTML string and returns a valid XHTML document node.

This functions automatically sets the following tidying parameters:

  • output-xml=yes
  • doctype=omit
  • quote-nbsp=no
  • char-encoding=utf8
  • newline=LF
  • tidy-mark=no

Parameters

  • $html

    the HTML string to tidy

Returns

  • document()

    the tidied XML document

Errors

  • html:InternalError

    if an internal error occurred while tidying the string.

Examples

parse#2

declare function html:parse(
    $html as xs:string,
    $options as element(html-options:options)
) as document()

This function tidies the given HTML string and returns a valid XHTML document node.

The second parameter allows to specify options that configure the tidy process. This parameter is a sequence of name=value pairs. Allowed parameter names and values are documented at http://tidy.sourceforge.net/docs/quickref.html.

Parameters

  • $html

    the HTML string to tidy

  • $options

    a set of name and value pairs that provide options to configure the tidy process that have to be validated against the "http://www.zorba-xquery.com/modules/converters/html-options" schema.

Returns

  • document()

    the tidied XHTML document node

Errors

  • err:XQDY0027

    if $options can not be validated against the html-options schema

  • html:TidyOption

    if there was an error with one of the options in the $options parameter that couldn't have been caught by validating against the schema

  • html:InternalError

    if an internal error occurred while tidying the string.

Examples

parse-internal#2

declare %:private function html:parse-internal(
    $html as xs:string,
    $options as element(html-options:options)
) as document() external

Returns

  • document()