http://zorba.io/modules/data-cleaning/normalization

Description

Before using any of the functions below please remember to import the module namespace:

import module namespace normalization = "http://zorba.io/modules/data-cleaning/normalization";

This library module provides data normalization functions for processing calendar dates, temporal values, currency values, units of measurement, location names and postal addresses. These functions are particularly useful for converting different data representations into cannonical formats.

The logic contained in this module is not specific to any particular XQuery implementation.

Module code

Here is the actual XQuery module code.

Imported modules

Authors

Bruno Martins and Diogo Simões

Version Declaration

xquery version "3.0" encoding "utf-8";

Namespaces

anhttp://zorba.io/annotations
httphttp://www.zorba-xquery.com/modules/http-client
normalizationhttp://zorba.io/modules/data-cleaning/normalization
verhttp://zorba.io/options/versioning

Function Summary

to-date($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a date value into a date representation valid according to the corresponding XML Schema type.

to-time($sd as xs:string, $format as xs:string?) as xs:string?

Converts a given string representation of a time value into a time representation valid according to the corresponding XML Schema type.

to-dateTime($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a dateTime value into a dateTime representation valid according to the corresponding XML Schema type.

normalize-address($addr as xs:string*) as xs:string*

Uses an address normalization Web service to convert a postal address given as input into a cannonical representation format.

normalize-phone($addr as xs:string*) as xs:string*

Uses an phone number normalization Web service to convert a phone number given as input into a cannonical representation.

timeZone-dictionary() as element(*)

Internal auxiliary function that returns an XML representation for a dictionary that contains the time-shift value associated to different time-zone abbreviations.

month-dictionary() as element(*)

Internal auxiliary function that returns an XML representation for a dictionary that contains a numeric value associated to different month name abbreviations.

check-dateTime($dateTime as xs:string) as xs:string

Internal auxiliary function that checks if a string is in xs:dateTime format .

check-date($date as xs:string) as xs:string

Internal auxiliary function that checks if a string is in xs:date format .

check-time($Time as xs:string) as xs:string

Internal auxiliary function that checks if a string is in xs:time format .

Functions

to-date#2

declare function normalization:to-date(
    $sd as xs:string,
    $format as xs:string?
) as xs:string

Converts a given string representation of a date value into a date representation valid according to the corresponding XML Schema type.

Parameters

  • $sd

    The string representation for the date

  • $format

    An optional parameter denoting the format used to represent the date in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: '%b' Abbreviated month name in the current locale. '%B' Full month name in the current locale. '%d' Day of the month as decimal number (01-31). '%m' Month as decimal number (01-12). '%x' Date, locale-specific. '%y' Year without century (00-99). '%Y' Year with century. '%C' Century (00-99): the integer part of the year divided by 100. '%D' Locale-specific date format such as '%m/%d/%y'. '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number. '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format). '%h' Equivalent to '%b'.

Returns

  • xs:string

    The date value resulting from the conversion.

Examples

to-time#2

declare function normalization:to-time(
    $sd as xs:string,
    $format as xs:string?
) as xs:string?

Converts a given string representation of a time value into a time representation valid according to the corresponding XML Schema type.

Parameters

  • $sd

    The string representation for the time.

  • $format

    An optional parameter denoting the format used to represent the time in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: '%H' Hours as decimal number (00-23). '%I' Hours as decimal number (01-12). '%M' Minute as decimal number (00-59). '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. '%S' Second as decimal number (00-61), allowing for up to two leap-seconds. '%X' Time, locale-specific. '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. '%Z' Time zone as a character string. '%k' The 24-hour clock time with single digits preceded by a blank. '%l' The 12-hour clock time with single digits preceded by a blank. '%r' The 12-hour clock time (using the locale's AM or PM). '%R' Equivalent to '%H:%M'. '%T' Equivalent to '%H:%M:%S'.

Returns

  • xs:string?

    The time value resulting from the conversion.

Errors

  • normalization:NOTSUPPORTED

    if the date type is not known to the service.

Examples

to-dateTime#2

declare function normalization:to-dateTime(
    $sd as xs:string,
    $format as xs:string?
) as xs:string

Converts a given string representation of a dateTime value into a dateTime representation valid according to the corresponding XML Schema type.

Parameters

  • $sd

    The string representation for the dateTime.

  • $format

    An optional parameter denoting the format used to represent the dateTime in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: '%b' Abbreviated month name in the current locale. '%B' Full month name in the current locale. '%c' Date and time, locale-specific. '%C' Century (00-99): the integer part of the year divided by 100. '%d' Day of the month as decimal number (01-31). '%H' Hours as decimal number (00-23). '%I' Hours as decimal number (01-12). '%j' Day of year as decimal number (001-366). '%m' Month as decimal number (01-12). '%M' Minute as decimal number (00-59). '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. '%S' Second as decimal number (00-61), allowing for up to two leap-seconds. '%x' Date, locale-specific. '%X' Time, locale-specific. '%y' Year without century (00-99). '%Y' Year with century. '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. '%Z' Time zone as a character string. '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format. '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number. '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format). '%g' The last two digits of the week-based year (see '%V'). '%G' The week-based year (see '%V') as a decimal number. '%h' Equivalent to '%b'. '%k' The 24-hour clock time with single digits preceded by a blank. '%l' The 12-hour clock time with single digits preceded by a blank. '%r' The 12-hour clock time (using the locale's AM or PM). '%R' Equivalent to '%H:%M'. '%T' Equivalent to '%H:%M:%S'.

Returns

  • xs:string

    The dateTime value resulting from the conversion.

Errors

  • normalization:NOTSUPPORTED

    if the dateTime type is not known to the service.

Examples

normalize-address#1

declare %an:nondeterministic function normalization:normalize-address(
    $addr as xs:string*
) as xs:string*

Uses an address normalization Web service to convert a postal address given as input into a cannonical representation format.

Parameters

  • $addr

    A sequence of strings encoding an address, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.

Returns

  • xs:string*

    A sequence of strings with the address encoded in a cannonical format, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.

Examples

normalize-phone#1

declare function normalization:normalize-phone(
    $addr as xs:string*
) as xs:string*

Uses an phone number normalization Web service to convert a phone number given as input into a cannonical representation.

Parameters

  • $phone

    A strings encoding a phone number.

Returns

  • xs:string*

    A strings with the phone number encoded in a cannonical format. Attention : This function is still not implemented.

timeZone-dictionary#0

declare %:private function normalization:timeZone-dictionary() as element(*)

Internal auxiliary function that returns an XML representation for a dictionary that contains the time-shift value associated to different time-zone abbreviations.

Returns

  • element(*)

month-dictionary#0

declare %:private function normalization:month-dictionary() as element(*)

Internal auxiliary function that returns an XML representation for a dictionary that contains a numeric value associated to different month name abbreviations.

Returns

  • element(*)

check-dateTime#1

declare %:private function normalization:check-dateTime(
    $dateTime as xs:string
) as xs:string

Internal auxiliary function that checks if a string is in xs:dateTime format

Parameters

  • $dateTime

    The string representation for the dateTime.

Returns

  • xs:string

    The dateTime string if it represents the xs:dateTime format.

check-date#1

declare %:private function normalization:check-date(
    $date as xs:string
) as xs:string

Internal auxiliary function that checks if a string is in xs:date format

Parameters

  • $dateTime

    The string representation for the date.

Returns

  • xs:string

    The date string if it represents the xs:date format.

check-time#1

declare %:private function normalization:check-time(
    $Time as xs:string
) as xs:string

Internal auxiliary function that checks if a string is in xs:time format

Parameters

  • $dateTime

    The string representation for the time.

Returns

  • xs:string

    The time string if it represents the xs:time format.