Data Definition Facility

Zorba has support for collections, indexes, and integrity constraints.This is accomplished via a combination of new prolog declarations, new kinds of expressions or extensions to existing expressions, and new built-in functions. Furthermore, both the static and the dynamic contexts are extended with new components that store information about collections, indexes, and integrity constraints. Collectively, all these extensions are called the Data Definition Facility

As part of the implementation, Zorba includes new "built-in" modules that contain the declarations of all the new built-in functions to manage (ddl) and manipulate (dml) collections, indexes, and integrity constraints. As usual, these modules must be imported by any other module that wants to invoke any of the functions.

Collections

Collections

A collection is defined as an ordered set of documents that is disjoint from any other collection and is uniquely identified by a QName. Furthermore, with respect to document order, the relative order of two nodes belonging to different documents within the same collection is defined to be the same as the relative position of their containing documents within the collection. We will say that a node belongs to a collection if it is inside a document that belongs to that collection.

Like a W3C collection, an collection can be viewed as a sequence of nodes: it is the sequence containing the root nodes of the documents that belong to the collection (and as we will see later, the function cdml:collection returns exactly this sequence of nodes). However, even when viewed as sequences of nodes, collections differ from W3C collections in the following ways:
  • They contain parent-less nodes only.
  • They can not contain any duplicate nodes.
  • Their nodes are in document order.
  • A node can not be contained in more than one collection.
  • collections are identified by QNames, whereas W3C collections are identified by URIs.
For brevity, in the remaining of this document we will use the term "collection" to mean collection. For backward compatibility with the W3C XQuery specification, Zorba retains some basic support for W3C collections (see http://zorba.io/modules/store/dynamic/collections/w3c/dml and http://zorba.io/modules/store/dynamic/collections/w3c/ddl). However, users are encouraged to use collections instead.Zorba supports five kinds of operations on collections: collection declaration, collection creation, collection deletion, collection update, and node retrieval. These are explained briefly in the following simple example. Full details for each operation are provided in the subsequent chapters.

Collections in action - A simple example


Let us assume an application that models a news organization. The application models its data as XML documents grouped into collections of logically related entities. In this example, we show how three such collections may be created and used; the first collection contains employee data, the second contains news articles, and the third contains information about the months of the year (e.g., the name, number of days, and fixed holidays for each month).

Before a collection can be created, it must be declared. A collection declaration describes the collection by providing a unique name for it and specifying certain properties (using XQuery Annotations) for the collection itself and for the documents in the collection. As explained in Collection Declaration, collections must be declared inside library modules. In terms of the XQuery language, collection declarations become part of a module's static context.

In this example, the declarations are placed inside the "news-data" library module (shown below). The declarations assign the names news-data:employees, news-data:articles, and news-data:months to the three collections, respectively. Documents in both the employees and the months collections are assumed to have a well-known structure, which is reflected in an XML schema ("news-schema"). The schema declares two global elements for employees and months respectively. Accordingly, the collection declarations for employees and months specify that their root nodes are elements whose name and type matches the name and type of the corresponding global element declarations in "news-schema". In contrast, articles may come from various sources (including external ones), and as a result, article documents do not have any particular schema. Therefore, the declaration for the articles collection specifies node() as the type of the root nodes. Both employee and article documents may be updated during their lifetime. Instead, the months-related information is fixed (can not change), so the nodes of the months collection are declared as 'an:read-only-nodes'. Furthermore, the collection itself is declared 'an:const', meaning that no months may be added to or deleted from this collection after it is created and initialized. Finally, we want the order of the month documents within their containing collection to be the same as the actual order of the months within the year. To achieve this, we have to declare the collection as "ordered", so that when we later insert the month documents in the collection, the system will store and return them in the same order as their insertion order. In contrast, the position of employees or articles inside their respective collections does not have any special meaning for the application, so the corresponding declarations do not specify any ordering property. This allows the system to store and access the contents of these collections in what it considers as the most optimal order.
  (: The "news-data" Library Module :)

  module namespace news-data = "http://www.news.org/data";

  import schema namespace news-schemas = "http://www.news.org/schemas";

  declare namespace an = "http://zorba.io/annotations";

  declare collection news-data:employees as schema-element(news-schema:employee)*;

  declare collection news-data:articles as node()*;

  declare %an:const %an:ordered %an:read-only-nodes collection news-data:months
    as schema-element(news-schema:month)*;

  declare variable $news-data:employees := xs:QName("news-data:employees");
  declare variable $news-data:articles := xs:QName("news-data:articles");
  declare variable $news-data:months := xs:QName("news-data:months");
Having been declared, the collections can now be created. Collection creation is illustrated by the "admin-script-1" script shown below. First, the collection descriptions must be made visible to the script. This is done by importing the "news-data" library module that contains the collection declarations. Then, the collections are created by calling the cddl:create function. There are two versions of this function: the first takes a QName as input and the second takes both a QName and a node-producing expression. In the first version, an empty document container is created by Zorba's storage system and registered inside a collections table that maps collection names to document containers. In the second version, the given expression is evaluated first, and (deep) copies are made of the nodes in the result sequence. This way, a sequence of distinct documents is produced. This is called the "insertion sequence". Then, as in the first version of the function, the document container is created and registered. Finally, the container is populated with the documents in the insertion sequence. In "admin-script-1", this second version is used to create and initialize the months collection. In fact, months must be initialized during creation because it is a constant collection, so no documents can be added to it later. The months are inserted in the collection in the order from January to December, and since the collection was declared as 'an:ordered', this order is preserved by the associated document container.
  (: "admin-script-1" :)

  import module namespace cddl = "http://zorba.io/modules/store/static/collections/ddl";

  import module namespace news-data = "http://www.news.org/data";

  cddl:create($news-data:employees);

  cddl:create($news-data:articles);

  cddl:create($news-data:months, (<month name="Jan">...</month>, ..., <month name="Dec">...</month>));
The next script ("user-script-1") shows how collections may be used. First the necessary modules and schemas are imported. Next, the employees collection is populated using the cdml:insert-nodes function. The first argument to this function is the QName of a collection, and the second is a node-producing expression (called the source expression). The QName is used to lookup the collection declaration and the collection itself (i.e., its document container). Then, the nodes produced by the source expression (source nodes) are copied and the copies are added to the document container, making sure that the actual type of each node matches the static type found in the collection declaration. Copying the source nodes (and their sub-trees) guarantees that the nodes in the insertion sequence are indeed parent-less nodes that do not belong to any other collection already and are distinct from each other. Notice that the need to validate the root nodes against the type specified in the collection declaration is the reason why the "news-schema" must be imported, even though no type defined by the schema is referenced explicitly in the query.

In this example, the employees collection is populated by a single call to the cdml:insert-nodes function, whose source expression is a concatenation of explicitly constructed documents. The articles collection is populated using the cdml:insert-nodes function as well, but in a slightly different fashion: The article documents are assumed to exist already, either as text files in the local filesystem, or at various web sites. As a result, the articles collection is populated via a concatenation of cdml:insert-function calls, each reading and parsing a single XML document and inserting the generated XML tree in the collection. Although there is one function call per article, the articles will be inserted all together in an atomic (all-or-nothing) operation, when the ";" at line 16 is processed. This is because, as explained in Updating Collections, the cdml:insert-nodes function (and all other functions that create, delete, or update collections) is an //updating function//, that is, rather than applying the insertion immediately, it produces an updating primitive that becomes part of a pending updates list (PUL), which is applied atomically when the next ";" appears in the program.

After populating the two collections, "user-script-1" runs a query expression that uses the cdmlcdml:collection function to access their root nodes. The expression returns, for each journalist, the articles authored by that journalist ordered by their date.

Finally, "user-script-1" uses the cdml:remove-nodes function to remove from the articles collection all articles that were published before 2000. Like cdml:insert-nodes, cdml:remove-nodes takes as input the QName of a collection and a node-producing source expression. The source nodes must be parent-less nodes that belong to the collection. The function looks up the collection declaration and the collection container, and removes the source nodes from the collection container.
  (: "user-script-1":)

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import module namespace http = "http://www.zorba-xquery.com/modules/http-client";

  import schema namespace news-schemas = "http://www.news.org/schemas";

  import module namespace news-data = "http://www.news.org/data";

  cdml:insert-nodes($news-data:employees, (<employee id="100">...</employee>, ..., <employee id="500">...</employee>));

  (
    cdml:insert-nodes($news-data:articles, doc("article1.xml")/article),
    cdml:insert-nodes($news-data:articles, http:get("http://www.reuters.com/article234.xhtml")//article),
    ....,
    cdml:insert-nodes($news-data:articles, doc("article100.xml")/article)
  );

  for $emp in cdml:collection($news-data:employees)[./position/@kind eq "journalist"]
  let $articles := for $art in cdml:collection($news-data:articles)[.//author//name eq $emp/name]
                   order by $art//date
                   return $art
  return <result>{$emp}<articles>{$articles//title}</articles></result>;

  cdml:delete-nodes(cdml:collection($news-data:articles)[.//date lt xs:date("01/01/2000")]);
We conclude this example with the "admin-script-2" script, which simply destroys the collections using the cddl:delete function. The function de-registers the collection from the collections table, destroys all the documents in the collection and all the indexes and integrity constraints associated the collection, and finally destroys the document container itself.
  (: admin-script2 :)

  import module namespace cddl = "http://zorba.io/modules/store/static/collections/ddl";

  import module namespace news-data = "http://www.news.org/data";

  cddl:delete($news-data:employees);

  cddl:delete($news-data:articles);

  cddl:delete($news-data:months);

Collection Declaration

  AnnotatedDecl
          ::= 'declare' ( CompatibilityAnnotation | Annotation )*
              ( VarDecl | FunctionDecl | CollectionDecl | IndexDecl | ICDecl )
  CollectionDecl ::= 'collection' EQName CollectionTypeDecl?

  CollectionTypeDecl ::= 'as' KindTest OccurrenceIndicator?
Collections are defined by collection declaration statements, which specify a unique name for a collection as a QName, a set of collection annotations (see Annotations on Collections and Indexes), the collection's static type. Syntactically, collection declarations are placed inside module prologs. The Prolog syntax is extended accordingly, as shown above. An additional constraint (not expressible syntactically) is that only library modules may contain collection declarations [zerr:ZDST0003]. This is because library modules can be shared among queries, whereas if a collection was declared inside a main module, then every other query that would like to use this collection would have to redeclared it in its main module. Worse, allowing collection declarations in "user" queries can lead to "data leaks": a collection declared and created by a user query and not destroyed by the same query will be unknown to the rest of the application, and may stay in the database indefinitely. In contrast, library modules containing declarations are expected to be under the jurisdiction of a system administrator who makes sure that queries see the data that they must see, and no data inconsistencies or leaks can arise.

To accommodate collection declarations, Zorba extends the static context with a component called the statically known collections. This is a map whose entries associate an expanded QName with an implementation-dependent representation of the information contained in a collection declaration with the same QName. The effect of a collection declaration is to add an entry to the statically known collections of the module containing the declaration. If the expanded QName of the collection is equal (as defined by the eq operator) to the expanded QName of another collection in the statically known collections of the same module, a static error is raised [zerr:ZDST0001]. Like variables and functions, the statically known collections of a module that is imported by another module are copied into the statically known collections of the importing module. It is a static error [zerr:ZDST0002] if the expanded QName of a collection declared in an imported module is equal (as defined by the eq operator) to the expanded QName of a collection declared in the importing module or in another imported module (even if the declarations are consistent).Zorba defines three categories of collection annotations:update mode (with possible values 'an:const', 'an:mutable', 'an:append-only', or 'an:queue'), ordering mode (with possible values 'an:ordered' or 'an:unordered'), and document update mode (with possible values 'an:read-only-nodes' and 'an:mutable-nodes').If not specified, the default values for update and ordering mode are 'an:mutable' and 'an:unordered', respectively. The default value for the document update mode is 'an:mutable-nodes'.

It is a static error [err::XQST0106] if a collection declaration contains more than one value for the same property. An ordered collection is a collection into which the ordering of documents is assumed to be meaningful for the application, and as a result, programmers can explicitly control the placement of documents via appropriate updating functions. In contrast, the ordering of documents inside unordered collections is implementation dependent, but stable (see Accessing Collections for details). A constant collection is one that is created with an initial set of documents and does not allow any subsequent insertions to or deletions from this initial set.

An 'an:append-only' collection does not allow any deletions at all and restricts insertions to take place at the "end" only, i.e., all new documents must be inserted after all existing ones. This implies a user-visible document ordering, and as a result, an 'an:append-only' collection must also be declared as 'an:ordered' [err:XQST0106]. A 'an:queue' collection forbids both insertions and deletions in/from the "middle"; only documents at the front of the collection may be deleted, and new documents can be inserted only at the end of a collection. Like 'an:append-only', 'an:queue' collections must be declared as 'an:ordered' [err:XQST0106]. If the document update mode of a collection is 'an:read-only-nodes' then an error is raised [zerr:ZDDY0010] every time a node of the collection appears as the target node of an updating expression; otherwise no such error is raised.In addition to the annotations described above, a collection declaration also specifies the collection static type, i.e., the static type for the result of the cdml:collection function. This is specified as a sequence type that adheres to the syntax and semantics of a KindTest plus an (optional) occurrence indicator. If no static type is specified, it is assumed to be document-node(element(*, xs:untyped))*. The static type without the occurrence indicator is the static type of the collection's root nodes.

Creating Collections


As explained already, collections are just sets of parent-less XML trees (called "documents"). In terms of the language, these sets "live" in the dynamic context. In particular, the dynamic context is extended with a component called the available collections. This is a map whose entries associate the expanded QName of a collection with the collection's document set. If an entry for a collection appears in the available collections of a module, the collection is said to be available to that module.

In practice, the available collections component is implemented by the storage system of Zorba. To begin with, each document set is implemented by some appropriate data structure that acts as a document container. The description of potential data structures is beyond the scope of this document, but the choice will, in general, depend on the properties of the collection and the contained documents. In addition to managing the document containers, the store maintains a collections table, which maps collection names to document containers. The collections table is accessible by all queries, so once an entry is added to the table, the associated collection is assumed to be available to every query and every module that participates in the execution of that query.

Creation of a collection involves creating an initially empty document container and "registering" that container in the collections table. We provide two functions for creating collections. Both are updating functions, so instead of actually performing the updates, they generate pending update primitives that become part of a pending update list (PUL) to be applied at a later time (see Extensions to the XQUF updates). The functions and their associated update primitives are described below:
  declare updating function cdml:create_collection($collectionName as xs:QName)

  upd:createCollection($collectionName as xs:QName)
The function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName identifies a collection that is available already, an error is raised [zerr:ZDDY0002].
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:createCollection($collectionName).
The update primitive is applied as follows:
  • An empty document container is created.
  • A entry is added to the collections table. The entry maps the collection's expanded QName to the document container.
The second create function creates the collection and populates it with an initial set of trees.
  declare updating function cdml:create_collection($collectionName as xs:QName, $nodes as node()*)
The function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName identifies a collection that is available already, an error is raised [zerr:ZDDY0002].
  • The expression that is given as the second argument to the function call is evaluated. The result of the evaluation is called the source sequence. If the source sequence contains an item that is not a node, or a node whose actual type does not match the root static type specified in the collection declaration, a type error is raised [zerr:XDTY0001].
  • Each of the nodes in the source sequence is copied as if it was a node returned by an enclosed expression in a direct element constructor (see http://www.w3.org/TR/xquery/#id-content). The construction and copy-namespaces modes used during the copy operation are the ones in the static context of the invoking module. Let $nodes be the sequence containing the copied nodes. Every node in $nodes is a root (parent-less) node that does not belong to any collection and is distinct from any other node in $nodes.
  • The result of the function is an empty XDM instance and a pending update list that consists of the following update primitives: upd:createCollection($collectionName) and upd:insertNodesFirst($collectionName, $nodes)
The upd:createCollection primitive was described above. The upd:insertNodesFirst will be described in Updating Collections, in the context of the cdml:insert-nodes-first function.

Accessing Collections

To access the root nodes of a collection, the cdml:collection function is provided.
  declare function cdml:collection($collectionName as xs:QName) as node()*
The function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • The result of the function is a sequence consisting of the root nodes in the collection. If the collection is declared as 'an:ordered', the ordering of the nodes in the result will reflect the order into which nodes were inserted in the collection by the node insertion functions (see Updating Collections). If the collection is declared as 'an:unordered', the ordering of the nodes in the result is implementation-dependent. In both cases, the nodes in the sequence are, by definition, in document order. For unordered collections, this document ordering is guaranteed to be stable within a query snapshot (i.e., until the next time updates are applied). For ordered collections, the document ordering is stable "forever" (i.e., two root nodes in the collection will compare the same as long as the collection (and the nodes) exist).
Another non-updating function that accesses a collection implicitly, is the index-of function:
  declare function cdml:index_of($node as node()) as xs:integer
The function is evaluated as follows:
  • If the given node is not a root node of a collection, an error is raised [zerr:ZDDY0011].
  • The result of this function is the position as xs:integer of the given node within its collection.

Updating Collections

A collection update is an operation that either inserts or deletes a number of root nodes (and their subtrees) to/from a collection. Zorba provides five updating functions that insert root nodes, and another five updating functions that delete root nodes. All of these functions are //updating functions// (in the terminology of the XQUF). As a result, rather than applying the update immediately, they produce an updating primitive that becomes part of a pending updates list (PUL), which is applied atomically when the next ";" appears in a script. The signature and semantics of each function and its associated update primitive are described in this section. The order in which the various update primitives are applied and constraints in how update primitives may be combined in a PUL are described in Extensions to the XQUF updates.

In addition to the updating insert functions, Zorba also provides five sequential insert functions (i.e. cdml:apply-insert-nodes, cdml:apply-insert-nodes-first, cdml:apply-insert-nodes-last, cdml:apply-insert-nodes-before, cdml:apply-insert-nodes-after) . These sequential counterparts apply the update primitive implicitly and return the node that was inserted into the collection. This is especially useful because nodes are copied before they are inserted into a collection.
  declare updating function cdml:insert-nodes($collectionName as xs:QName, $nodes as node()*)

  upd:insertIntoCollection($collectionName as xs:QName, $nodes as node()*)
The insert-nodes function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • If the update mode of the collection is const, append-only, or queue, an error is raised [zerr:ZDDY0004], [zerr:ZDDY0005], or [zerr:ZDDY0006], respectively.
  • The expression that is given as the second argument to the function call is evaluated. The result of the evaluation is called the source sequence. If the source sequence contains an item that is not a node, or a node whose actual type does not match the KindTest specified in the collection declaration, a type error is raised [zerr:XDTY0001].
  • Each of the nodes in the source sequence is copied as if it was a node returned by an enclosed expression in a direct element constructor (see http://www.w3.org/TR/xquery/#id-content). The construction and copy-namespaces modes used during the copy operation are the ones in the static context of the invoking module. Let $nodes be the sequence containing the copied nodes. Every node in $nodes is a parent-less root node that does not belong to any collection and is distinct from any other node in $nodes.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:insertIntoCollection($collectionName, $nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • The root nodes in $nodes are inserted into the container. If the collection is an ordered one, then all the nodes are inserted next to each other and in the same order as they appear in $nodes. The position of the first node to be inserted is implementation-dependent. The relative positions of pre-existing root nodes do not change as a result of the insertions. If the collection is an unordered one, each node is inserted in some implementation-dependent position. Furthermore, the relative positions of pre-existing root nodes may change as a result of the insertions.
  declare updating function cdml:insert-nodes-first($collectionName as xs:QName, $nodes as node()*)

  upd:insertFirstIntoCollection($collectionName as xs:QName, $nodes as node()*)
The insert-nodes-first function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • If the update mode of the collection is 'an:const', 'an:append-only', or 'an:queue', an error is raised [zerr:ZDDY0004], [zerr:ZDDY0005], or [zerr:ZDDY0006], respectively.
  • If the collection is 'an:unordered', an error is raised [zerr:ZDDY0012].
  • The expression that is given as the second argument to the function call is evaluated. The result of the evaluation is called the source sequence. If the source sequence contains an item that is not a node, or a node whose actual type does not match the KindTest specified in the collection declaration, a type error is raised [zerr:XDTY0001].
  • Each of the nodes in the source sequence is copied as if it was a node returned by an enclosed expression in a direct element constructor (see http://www.w3.org/TR/xquery/#id-content). The construction and copy-namespaces modes used during the copy operation are the ones in the static context of the invoking module. Let $nodes be the sequence containing the copied nodes. Every node in $nodes is a parent-less root node that does not belong to any collection and is distinct from any other node in $nodes.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:insertFirstIntoCollection($collectionName, $nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • The root nodes in $nodes are inserted at the "beginning" of the container. Specifically, the first node is inserted at the first position, and the rest of the nodes are inserted after the first one and in the same order as they appear in $nodes.
  declare updating function cdml:insert-nodes-last($collectionName as xs:QName, $nodes as node()*)

  upd:insertLastIntoCollection($collectionName as xs:QName, $nodes as node()*)
The insert-nodes-last function is evaluated the same way as the insert-nodes-first function except:
  • If the collection is 'an:append-only' or 'an:queue', the insertion is allowed (i.e., the errors ZDDY0005 or ZDDY0006 are not raised).
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:insertLastIntoCollection($collectionName, $nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • The root nodes in $nodes are inserted at the "end" of the container. Specifically, the first node is inserted after the last existing node, and rest of the nodes are inserted after the first one and in the same order as they appear in $nodes.
  declare updating function cdml:insert-nodes-before($collectionName as xs:QName, $target as node(), $nodes as node()*)

  upd:insertBeforeIntoCollection($collectionName as xs:QName, $target as node(), $nodes as node()*)
The insert-nodes-before function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • If the update mode of the collection is 'an:const', 'an:append-only', or 'an:queue', an error is raised [zerr:ZDDY0004], [zerr:ZDDY0005], or [zerr:ZDDY0006], respectively.
  • If the collection is 'an:unordered', an error is raised [zerr:ZDDY0012].
  • The expression that appears as the second argument to the function call is evaluated. The expression must return a single node, called the target node. If the target node is not a root node that belongs to the collection, an error is raised [zerr:ZDDY0011].
  • The expression that is given as the third argument to the function call is evaluated. The result of the evaluation is called the source sequence. If the source sequence contains an item that is not a node, or a node whose actual type does not match the KindTest specified in the collection declaration, a type error is raised [zerr:XDTY0001].
  • Each of the nodes in the source sequence is copied as if it was a node returned by an enclosed expression in a direct element constructor (see http://www.w3.org/TR/xquery/#id-content). The construction and copy-namespaces modes used during the copy operation are the ones in the static context of the invoking module. Let $nodes be the sequence containing the copied nodes. Every node in $nodes is a parent-less root node that does not belong to any collection and is distinct from any other node in $nodes.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:insertBeforeIntoCollection($collectionName, $target, $nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • The root nodes in $nodes are inserted into the container before the given target node. Specifically, if the target node is at position K, the first node is inserted at position K, and rest of the nodes are inserted after the first one and in the same order as they appear in $nodes. After the insertion, the target node will at position K+N, where N is the number of nodes in $nodes.
  declare updating function cdml:insert-nodes-after($collectionName as xs:QName, $target as node(), $nodes as node()*)

  upd:insertAfterIntoCollection($collectionName as xs:QName, $target as node(), $nodes as node()*)
The insert-nodes-after function is evaluated the same way as the insert-nodes-before function except:
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:insertAfterIntoCollection($collectionName, $target, $nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • The root nodes in $nodes are inserted into the container after the given target node. Specifically, if the target node is at position K, the first node is inserted at position K+1, and rest of the nodes are inserted after the first one and in the same order as they appear in $nodes.
  declare updating function cdml:delete-nodes($nodes as xs:node()*)

  upd:deleteFromCollection($nodes as xs:node()*)
The delete-nodes function is evaluated as follows:
  • The expression that appears as the first argument to the function call is evaluated. The result of this evaluation is called the deletion sequence. If there is any node in the deletion sequence that is not a root node belonging to a collection, an error is raised [zerr:ZDDY0011]. Let $nodes be the deletion sequence.
  • If the update mode of a collection of any node is 'an:const', 'an:append-only', or 'an:queue', an error is raised [zerr:ZDDY0004], [zerr:ZDDY0007], or [zerr:ZDDY009], respectively.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:deleteFromCollection($nodes).
The update primitive is applied as follows:
  • The document container for the collection is found via the collections table.
  • Each document that is rooted at a node in $nodes is removed from the container, if it is still there (earlier delete primitives in the same PUL may have deleted the tree already). If there are no variables that are bound to any of the document's nodes, the document is destroyed. Otherwise, the document will be destroyed as soon as there are no variables bound to any of its nodes.
  declare updating function cdml:delete-nodes-first($collectionName as xs:QName, $number as xs:unsignedLong)
The delete-nodes-first function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • If the update mode of the collection is const, or append-only, an error is raised [zerr:ZDDY0004] or [zerr:ZDDY0007] respectively.
  • The expression that appears as the second argument to the function call is evaluated, producing a single positive integer. Let $number be that integer.
  • If the collection has fewer than $number nodes, an error is raised [zerr:ZDDY0011].
  • Let $nodes be the sequence consisting of the first $number root nodes in the collection.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:deleteFromCollection($collectionName, $nodes).
  declare updating function cdml:delete-node-first($collectionName as xs:QName)
The delete-node-first function is a special case of the delete-nodes-first function. Specifically, delete-node-first($collectionName) is equivalent to delete-nodes-first($collectionName, 1).
  declare updating function cdml:delete-nodes-last($collectionName as xs:QName, $number as xs:unsignedLong)
The delete-nodes-last function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • If the update mode of the collection is const, append-only, or queue, an error is raised [zerr:ZDDY0004], [zerr:ZDDY0007], or [zerr:ZDDY009], respectively.
  • The expression that appears as the second argument to the function call is evaluated, producing a single positive integer. Let $number be that integer.
  • If the collection has fewer than $number nodes, an error is raised [zerr:ZDDY0011].
  • Let $nodes be the sequence consisting of the last $number root nodes in the collection.
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:deleteFromCollection($collectionName, $nodes).
  declare updating function cdml:delete-node-last($collectionName as xs:QName)
The delete-node-last function is a special case of the delete-nodes-lasst function. Specifically, delete-node-last($collectionName) is equivalent to delete-nodes-last($collectionName, 1).

Destroying Collections

To destroy a collection, Zorba provides the delete updating function. The function itself and its associated update primitive are described below.
  declare updating function cddl:delete($collectionName as xs:QName)

  upd:deleteCollection($collectionName as xs :QName)
The delete function is evaluated as follows:
  • If the given expanded QName does not identify a collection among the statically known collections in the static context of the invoking module, an error is raised [zerr:ZDDY0001].
  • If the given expanded QName does not identify a collection among the available collections in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0003].
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:deleteCollection($collectionName).
The update primitive is applied as follows:
  • If there is any available index whose domain expression or any of its key expressions reference the collection, an error is raised [zerr:ZDDY0013].
  • If there is any active integrity constraint on the collection, an error is raised [zerr:ZDDY0014].
  • If there is any in-scope variable that references any node in the collection, an error is raised [zerr:ZDDY0015].
  • The document container for the collection is found via the collections table.
  • All documents in the container are destroyed.
  • The container itself is destroyed.
  • The entry mapping the collection name to its container is removed from the collections table.

Indexes


Zorba supports two kinds of indexes, value indexes and general indexes. As shown in Indexes in action - A simple example, value indexes can be used to optimize queries involving value comparisons, whereas general indexes can be used to optimize queries involving value and/or general comparisons. Although general indexes can handle both kinds of comparisons, value indexes are more compact and efficient, and as a result, they should be preferred over general indexes for data on which no general comparisons are expected.

A value index is a set whose contents (called index entries) are defined by a "domain" expression and a number of "key" expressions. Informally, a value index is created by evaluating its domain expression first, resulting in a sequence of nodes (called the index domain sequence). Then, for each node D in the domain sequence, the key expressions are evaluated with node D serving as their context node. A key expression must not return more than one value. If a value returned by a key expression is not atomic, it is converted to an atomic value via atomization. Thus, if N is the number of key expressions, then for each domain node, an associated key tuple of N atomic values is constructed. The purpose of the index is to map key tuples to domain nodes. In general, several domain nodes may produce the same key tuple. As a result, each index entry is a pair consisting of a key tuple and the set of domain nodes that produced the key tuple.

General comparison operators accept operands that are sequences potentially containing more than one item. As a result, the main difference between value and general indexes is the the later allow a key expression to return multiple values with potentially different data types. On the other hand, for simplicity, the current Zorba implementation restricts the number of key expressions for general indexes to one expression only.

Like value indexes, general indexes are sets of index entries, where each index entry is a pair consisting of an atomic key value and the set of associated domain nodes. Informally, the set of entries for a general index is created by evaluating its domain expression first, resulting in a sequence of domain nodes. Then, for each node D in the domain sequence, the key expression is evaluated with node D serving as its context node. A key expression may return a sequence of arbitrary number of items, called the key sequence. Items in the key sequence may have different data types. If an item in the key sequence is a node, it is converted to one or more atomic values via atomization, and the atomic values replace the node in the key sequence. If an item in the key sequence has type xs:untypedAtomic, it is removed from the key sequence and is cast to every other atomic built-in type. Then, for each successful cast, the resulting atomic value is put into the key sequence. Thus, for each domain node D, a key sequence is constructed that contains atomic values none of which has type xs:untypedAtomic. For each value K in this key sequence, the pair [K, D] is inserted in the index. If an entry for K exists already, D is inserted in the associated set of domain nodes; otherwise a new index entry is created, mapping K to the set { D }.
Zorba supports the following five operations on indexes: declaration, creation, deletion, probing and maintenance. These are explained briefly in the following simple example. Full details for each operation are provided in the subsections after the example.

Indexes in action - A simple example


Let us consider the same news application we used in Collections in action - A simple example. In this example, we will show how to create and use indexes on the collections of the news organization. First, let us assume that each employee has a city where he/she is currently stationed at. We want to create an index that maps city names to the employees that are stationed in those cities. The index will contain one entry for each city where at least one employee is stationed in. Let us also assume that we want to search for journalists based on the number of articles they have written. For this, we will create an index that maps article counts to the employees who are journalists and have produced that number of articles. Finally, we want to be able to quickly find the manager of any given employee. For this, we will create an index that maps employee ids to the manager of the associated employee.

Before an index can be created, it must be declared. An index declaration describes the index by providing its domain expression, its key expressions, and certain index properties (declared as annotations); it also specifies a name for referencing the index in subsequent operations. Like collections, indexes must be declared inside the prolog of library modules. In terms of the XQuery language, index declarations become part of a module's static context.

In this example, the index declarations are placed inside the "news-data" library module shown below (same as the module we saw in Collections in action - A simple example, except for the additional index declarations). The first index declaration assigns the name news-data:CityEmp to the index. It uses the "on nodes" and "by" keywords to specify the domain and key expressions respectively. The "as" keyword specifies a target atomic data type which the result of the key expression must match with (after atomization). The index is declared as a 'an:value-equality' index. This means that it can be used to find the employees in a particular city, but not in a "range" of cities. In other words, the index is not aware of any ordering among city names. Finally, the maintenance property of the index is set to "automatically maintained" ('an:automatic'). Briefly, an automatically maintained index is one whose maintenance is the responsibility of Zorba rather than the XQuery programmers.

The second index declaration assigns the name news-data:ArtCountEmp to the index. Its domain expression selects all employees who are journalists. Its key expression computes the number of articles written by the "current" journalist. This index is declared as a "value range" ('an:value-range') index, which means that it can be used to find journalists whose article count is within a given range. Finally, the index is also declared as "manually maintained" ('an:manual'), which means that programmers must explicitly request that the index be synchronized with the underlying data.

The last index declaration assigns the name news-data:EmpMgr to the index. The index is declared as a 'an:general-equality' index, which, like 'an:value-equality' means that the index does not maintain its keys in any order. The index key expression selects, for each employee E, the ids of the employees managed by E. Notice that this set of ids may be empty. The index will contain an entry mapping the empty sequence to the employees who do not manage anybody. Notice also that no type declaration is required for the key expression. Typically, the employee ids will all be integers or strings or untypedAtomic. All of these cases can be handled by the news-data:EmpMgr index, as well as the not very likely scenario where different kinds of employees have ids of different data types.
  (: The "news-data" Library Module :)

  module namespace news-data = "http://www.news.org/data";

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import schema namespace news-schemas = "http://www.news.org/schemas";

  declare namespace an = "http://zorba.io/annotations";

  declare collection news-data:employees as schema-element(news-schema:employee)*;

  declare collection news-data:articles as node()*;

  declare %an:const %an:ordered %an:read-only-nodes collection news-data:months
    as schema-element(news-schema:month)*;

  declare %an:automatic %an:value-equality index news-data:CityEmp
    on nodes cdml:collection(xs:QName("news-data:employees"))/employee
    by .//station/city as xs:string;

  declare %an:manual %an:value-range index news-data:ArtCountEmp
    on nodes cdml:collection(xs:QName("news-data:employees"))/employee[./position/@kind eq "journalist"]
    by count(for $art in cdml:collection(xs:QName("news-data:articles"))//article
             where $art/empid = ./id
             return $art) as xs:integer;

  declare %an:automatic %an:general-equality index news-data:EmpMgr
    on nodes cdml:collection(xs:QName("news-data:employees"))/employee
    by ./manages//@empid;

  declare variable $news-data:employees := xs:QName("news-data:employees");
  declare variable $news-data:articles := xs:QName("news-data:articles");
  declare variable $news-data:months := xs:QName("news-data:months");
  declare variable $news-data:CityEmp := xs:QName("news-data:CityEmp");
  declare variable $news-data:ArtCountEmp := xs:QName("news-data:ArtCountEmp");
  declare variable $news-data:EmpMgr := xs:QName("news-data:EmpMgr");
Having declared the indexes in a library module, they can now be created. This is done by the "admin-script-3" script shown below. The script must first import the "news-data" module. As far as indexes are concerned, the effect of this import is to create three entries in the static context of the main module, mapping the index names to the index definitions (domain expression, key specification, and properties). Then, the query creates the indexes by invoking the iddl:create function, passing the name of the index as input.

Let us consider the creation of the CityEmp index (the process is the similar for the ArtCountEmp and EmpMgr indexes). Index Creation starts with retrieving the index definition from the static context, using the index name. Then, an index container is created, whose entries will be pairs associating a city name with a set of employees. Next, the index container is populated using the process outlined earlier: The domain expression is evaluated, and for each employee node E in the domain sequence, the name of the city C where the employee is currently stationed in is retrieved by evaluating the key expression, atomizing its result, and checking that the atomic value matches the specified target type. Finally, the pair [E, C] is inserted in the index: if an entry for C exists already, E is inserted in the set associated with C; otherwise, an new entry is created mapping C to the set { E }. The last step in index creation involves registering the index inside an indexes table that maps index names to index containers. The index container will remain registered until it is destroyed by a call to the iddl:delete function (see the "admin-script-4" script below).
  (: The "admin-script-3" script :)

  import module namespace iddf = "http://zorba.io/modules/store/static/indexes/ddl";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  iddf:create($news-data:CityEmp);

  iddf:create($news-data:ArtCountEmp);

  iddf:create($news-data:EmpMgr);
The next step in this example is to show how the index can be used to to optimize query performance, which of course, is the primary motivation for supporting indexes in any data-processing system. Zorba provides four functions for index probing: idml:probe-index-point-value, idml:probe-index-range-value, idml:probe-index-point-general, and idml:probe-index-range-general. idml:probe-index-point-value is supported by all kind of indexes, idml:probe-index-point-general is supported by general indexes (equality and range), idml:probe-index-range-value is supported by value and general range indexes, and idml:probe-index-range-general is supported by general range indexes only.

The "probe-1" query illustrates the use of idml:probe-index-point-value. The query returns the names of all employees stationed in Paris. As shown, the idml:probe-index-point-value function takes the index name and the keyword "Paris" as inputs. It uses the index name to find the index container via the indexes tables, looks-up the entry for "Paris" inside this container, and returns all the associated employee nodes.

The "probe-2" query illustrates index probing via the idml:probe-index-range-value function. The query returns all journalists who have written at least 100 articles. As shown, the first parameter of the idml:probe-index-range-value function is the index name, followed by 6 parameters per key expression. The 6 parameters specify a range of value for the key values: the first 2 are the lower and upper values of the range, the next two are booleans that specify whether the range does indeed have a lower and/or upper bound, and the last 2 are also booleans that specify whether the range is open or closed from below or above (i.e., whether the lower/upper bound are included in the range or not).

The "probe-3" query illustrates the use of idml:probe-index-point-general. The query returns the managers of the employees whose id is of type string (or subtype) and its value is "100" or "200". It will also return the managers of the employees whose id is of type untypedAtomic and its value, when cast to string, is "100" or "200".

The "no-probe-1", "no-probe-2", "no-probe-3" queries return the same results as "probe-1", "probe-2", and "probe-3", respectively, but without using any index. Normally, the performance of the probe queries will be much better than that of the corresponding no-probe queries. This is because, in general, indexes organize their entries in ways that make the execution of the probe functions very efficient. Typically, some kind of a hash table (for value equality indexes) or ordered tree (for value range indexes) data structure is employed, and as we will see, Zorba support both kinds of indexes. So, for example, the "probe-1" query does not have to access every entry in the index until it finds the one for Paris, whereas the "no-probe-1" query has to access every employee in the collection and check his/her city.

People familiar with SQL and modern relational DBMSs would probably expect the query optimizer to be able to automatically rewrite queries like "no-probe-1" to queries like "probe-1". The Zorba query optimizer does not yet detect index-related rewrites automatically. Although, we do plan to offer automatic index-related rewrites in the near future, we also expect the probing functions to remain useful for manual rewrites because both the XQuery language and the kind of indexes that are allowed in Zorba can be much more complex than their relational counterparts.
  (: The "probe-1" query :)

  import module namespace idml = "http://zorba.io/modules/store/static/indexes/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  idml:probe-index-point-value($news-data:CityEmp, "Paris")
  (: The "probe-2" query :)

  import module namespace idml = "http://zorba.io/modules/store/static/indexes/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  idml:probe-index-range-value($news-data:ArtCountEmp, 100, (), true, false, true, false)
  (: The "probe-3" query :)

  import module namespace idml = "http://zorba.io/modules/store/static/indexes/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  idml:probe-index-point-general($news-data:EmpMgr, ("100", "200"))
  (: The "no-probe-1" query :)

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  cdml:collection($news-data:employees)/employee[.//station/city eq "Paris"]
  (: The "no-probe-2" query :)

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  for $emp in cdml:collection($news-data:employees)/employee[./position/@kind eq "journalist"]
  where 100 le count(for $art in cdml:collection(xs:QName("news-data:articles"))//article
                     where $art/empid eq $emp/id
                     return $art)
  return $emp
  (: The "probe-3" query :)

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  for $mgr in cdml:collection($news-data:employees)/employee
  where $mgr/manages//@empid = ("100", "200")
  return $mgr


Now, let us consider what happens when the data on which an index is built gets updated. In general, index maintenance is the operation where the index contents are updated so that they reflect the index definition with respect to the current snapshot of the data. Zorba offers two maintenance modes: manual and automatic. If an index is declared as 'an:manual', index maintenance is done only when the function idml:refresh-index is invoked inside a query. Essentially, in manual mode maintenance is in the control of the query programmers, and the index may become stale between two consecutive calls to the idml:refresh-index function. In contrast, if an index is declared as 'an:automatic', Zorba guarantees that the index stays up-to-date at any given time.

In this example, the CityEmp index was declared as automatic. The "index-maintenance" query shown below transfers the employee with id "007" from his current city, say Paris, to Beijing. Since index CityEmp is automatic, after the update is applied, Zorba will initiate a maintenance operation on the index, whereby the employee node will be removed from the node set associated with Paris and inserted into the node set associated with Beijing (if there is no other employee stationed in Beijing already, an entry for it will be created first). Notice that although the index is not explicitly referenced anywhere in this query, its definition must still be available to the query because it is needed to perform the index maintenance. In this example, the query imports the "news-data" module because it contains the declaration for the employees collection, which is referenced by the query. But the "news-data" module contains the index declaration as well, so index maintenance can find the index definition. In general, it is a best practice to declare an index in the same module as the collections that are referenced by the index.

The ArtCountEmp index is more complex than the CityEmp index, so the system may not be able to maintain it in an efficient way. Furthermore, the index contains "statistical" information, so it may be acceptable if its contents are not always in sync with the underlying data. For these reasons, the ArtCountEmp index was declared as 'an:manual'.
  (: The "index-maintenance" query :)

  import module namespace cdml = "http://zorba.io/modules/store/static/collections/dml";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  replace node value cdml:collection($news-data:employees)/employee[@id eq "007"]//station/city
  with "Beijing"
Finally, we conclude this example with a query that shows how to destroy an index. As shown in "admin-script-4" below, index deletion is done via the iddl:delete function. The function simply destroys the index container and removes the mapping between the index name and the index container from the indexes table. After the index is deleted, any query that tries to access the index will receive an error.
  (: The "admin-script-4" query :)

  import module namespace iddl = "http://zorba.io/modules/store/static/indexes/ddl";

  import module namespace news-data = "http://www.news.org/data" at "news_data.xqlib";

  iddl:delete($news-data:CityEmp);

Index Declaration

  IndexDecl ::= 'index' IndexName
               'on' 'nodes' IndexDomainExpr
               'by' IndexKeySpec (',' IndexKeySpec)*

  IndexName ::= EQName

  IndexDomainExpr ::= PathExpr

  IndexKeySpec ::= IndexKeyExpr IndexKeyTypeDecl? IndexKeyCollation?

  IndexKeyExpr ::= PathExpr

  IndexKeyTypeDecl ::= 'as' AtomicType OccurrenceIndicator?

  AtomicType ::= EQName

  IndexKeyCollation ::= 'collation' URILiteral

  Note: the following annotations are accepted within the context of an index declaration:

     %an:unique, %an:nonunique,
     %an:value-range, %an:value-equality, 
     %an:general-range, %an:general-equality,
     %an:manual or %an:automatic
Syntactically, each index is defined by an index declaration, which specifies a unique name for the index as a QName, the index domain expression, a number of key specifications, and a set of index properties (given as annotations; see Annotations on Collections and Indexes). The IndexDecl syntax shown above is common to both value indexes and general indexes. Whether an index is a value or a general index is determined by the value of the usage property, which is explained below.

Index declarations (for both value and general indexes) must be placed inside module prologs. The Prolog syntax is extended accordingly, as shown above. An additional constraint (not expressible syntactically) is that only library modules may contain index declarations [zerr:ZDST0023]. The reasons for this rule are the same as those for collections (see Collection Declaration). Furthermore, the qname of an index must have the same namespace URI as the target namespace URI of the declaring library module [zerr:ZDST0036].

To accommodate index declarations, Zorba extends the static context with a component called the statically known indexes. This is a map whose entries associate an expanded QName with an implementation-dependent representation of the information contained in an index declaration with the same QName. Each index declaration adds an entry to the statically known indexes of the module containing the declaration. If the expanded QName of the index is equal to the expanded QName of another index in the statically known indexes of the same module, a static error is raised [zerr:ZDST0021]. Like the statically known collections, the statically known indexes of a module that is imported by another module are copied into the statically known indexes of the importing module. It is a static error [zerr:ZDST0022] if the expanded QName of an index declared in an imported module is equal to the expanded QName of an index declared in the importing module or in another imported module (even if the declarations are consistent).

Zorba defines three index properties which are syntactically expressed as annotations: uniqueness (with possible values 'an:unique' or 'an:nonunique'), usage (with possible values 'an:value-range', 'an:value-equality', 'an:general-range', or 'an:general-equality'), and maintenance mode (with possible values 'an:manual' or 'an:automatic'). The syntax allows the values for these properties to be listed in any order or not be specified at all. If not specified, the default values for uniqueness, usage, and maintenance mode are 'an:nonunique', 'an:value-equality', and 'an:automatic', respectively. It is a static error [zerr::XQST0106] if more than one value is listed in an index declaration for any of these properties.

The uniqueness property determines the kind of relationship between keys and domain nodes: if the index is declared as 'an:unique', Zorba makes sure that the relationship is one-to-one, that is, each index entry associates a key value (or key tuple in the case of value indexs) with exactly one domain node. Otherwise, if the index is 'an:nonunique', multiple domain nodes may have the same key value, and as a result, each index entry associates a key with a set of domain nodes. In the current implementation, it is not allowed to declare an index as unique if it is a general index whose IndexKeyTypeDecl is either absent or specifies xs:anyAtomicType or xs:untypedAtomic as its atomic type [zerr:ZDST0025].

The usage property specifies the kind of the index based on the query expressions that may be optimized by using the index. A value equality index can optimize expressions involving value equality predicates only. The "probe-1" and "no-probe-1" queries in Indexes in action - A simple example are an example of such usage. As shown there, a value equality index supports the idml:probe-index-point-value function. A value range index can optimize expressions involving any kind of value comparison. The "probe-2" and "no-probe-2" queries in Indexes in action - A simple example are an example of such usage. A value range index supports both the idml:probe-index-point-value and the idml:probe-index-range-value functions. A general equality index can optimize expressions involving either value equality or general equality predicates. Finally, a general range index can optimize expressions involving any kind of value or general comparison predicates.

The maintenance mode specifies how index maintenance is done. The current Zorba implementation offers two maintenance modes: 'an:manual' and 'an:automatic'. For a manual index, maintenance is done only when the function idml:refresh-index (described in Index Maintenance) is invoked inside a query. Essentially, in manual mode maintenance is in the control of the query programmers, and the index may become stale between two consecutive calls to the idml:refresh-index function. In contrast, for an automatic index, Zorba guarantees that the index stays up-to-date at any given time.The index declaration syntax is very liberal with respect to the expressions that can appear as domain or key expressions. However, the following semantic restrictions are imposed on the domain expression and each of the key expressions:
  • They must be deterministic expressions [zerr:ZDST0028].
  • They must be simple expressions (i.e., not updating or sequential) [zerr:ZDST0033].
  • They must not invoke any input functions other than cdml:collection [zerr:ZDST0029]. Moreover, the argument to each cdml:collection call must be a constant expression returning a QName value [zerr:ZDST0030]. (A constant expression is an expression that doesn't access the dynamic context).
  • They must not reference and variables other than the ones defined inside the expressions themselves [zerr:ZDST0031].
  • If the index is declared as 'an:automatic', an error is raised [zerr:ZDST0034] if the domain and/or the key expressions are too complex for Zorba to perform index maintenance in an efficient manner (see Index Maintenance for details)
Furthermore, the domain expression must satisfy the following additional semantic restrictions:
  • Its context item, context position, and context size are considered undefined, and as a result they must not be referenced [zerr:ZDST0032].
  • It must generate a sequence of nodes [zerr:XDTY0010].
  • Each node in the domain sequence must belong to a collection that appears in the available collections of the module that contains the index declaration [zerr:ZDDY0020].
  • For general indexes only, the domain expression should not return any duplicate nodes [zerr:ZDDY0028]
With each key expression, an index declaration associates a key type and a key collation. The triplet IndexKeyExpr, IndexKeyTypeDecl, IndexKeyCollation is called a keyspec. For general indexes, the number of keyspecs must be exectly one [zerr:ZDST0035]. The IndexKeyTypeDecl is optional for general indexes (in which case it is assumed to be xs:anyAtomicItem*), but is required for value indexes [zerr:ZDST0027]. The IndexKeyTypeDecl provides a sequence type that the atomized result of the associated key expression (for each domain node) must match with according to the rules of sequence type matching. For value indexes, the atomic type specified in IndexKeyTypeDecl must not be xs:anyAtomicType or xs:untypedAtomic [zerr:ZDST0027]. Furthermore, for value indexes, the occurrence indicator must be either absent or equal to '?' [zerr:ZDST0027]. Finally, if the index is a value range or general range index, an ordering must exist among the values in the type domain [zerr:ZDST0027] (this rules excludes the following atomic types and their subtypes: QName, NOTATION, hexBinary, hex64Binary, gYearMonth, gYear, gMonthDay, gMonth, and gDay).

If the key type in a keyspec is xs:string (or subtype of), the IndexKeyCollation specifies the collation to use when comparing key values from this keyspec. If no collation is specified, the default collation from the static context of the declaring module is used.

Index Creation


As explained already, indexes are just sets of index entries, where an index entry maps a key item or a key tuple to a set of domain nodes (to be more precise, an index entry contains some kind of "pointers" to nodes, not the nodes themselves). In terms of the XQuery language, indexes "live" in the dynamic context. In particular, Zorba extends the dynamic context with a component called the available indexes. This is a map whose entries associate the expanded QName of an index with the entry set for that index.

In practice, the available indexes component is implemented by Zorba's storage system. To begin with, each index is implemented by some appropriate data structure that acts as an index entry container. The description of potential data structures is beyond the scope of this document, but the typical choices are either some sort of hash table(s) (for equality indexes) or some kind of ordered tree(s) (for range indexes). To manage these containers, the store maintains an indexes table, which maps index names to index entry containers. The indexes table is accessible by all queries, so once an entry is added to the table, the associated index is assumed to be available to every query and every module that participates in the execution of that query.Creation of an index involves creating an initially empty index entry container, populating that container with the entries computed by the domain and key expressions of the index, and "registering" that container in the indexes table. All this is done by the iddl:create function that is described below. In fact, iddl:create is an updating function, so instead of actually creating the index, it generates a pending update primitive that becomes part of a pending update list (PUL) to be applied at a later time. The update primitive is also described below.
  declare updating function iddl:create($indexName as xs:QName)

  upd:createIndex($indexName as xs:QName).
The create function is evaluated as follows:
  • If the given expanded QName does not identify an index among the statically known indexes in the static context of the invoking module, an error is raised [zerr:ZDDY0021].
  • If the given expanded QName identifies an index that is available already, an error is raised [zerr:ZDDY0022].
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:createIndex($indexName).
The update primitive is applied as follows:
  • An empty index entry container is created.
  • The domain expression is evaluated first. If the result of the domain expression contains an item that is not a node, an error is raised [zerr:XDTY0010]. If the result of the domain expression contains any duplicate nodes, then for value indexes, the duplicate nodes are removed, but for general indexes, an error is raised [zerr:ZDDY0028]. The result of the domain expression after duplicate elimination is called the domain sequence. If the domain sequence contains a node that does not belong to a collection, an error is raised [zerr:ZDDY0020].
  • For each node D in the domain sequence, the IndexKeySpecs are evaluated in some implementation dependent order. An IndexKeySpec is evaluated as follows:
    • The key expression in the IndexKeySpec is evaluated, with D serving as its context item.
    • Atomization is applied to the result of the key expression.
    • The result of atomization is matched against the associated IndexKeyTypeDecl, according to the rules of sequence type matching. If the type match fails, an error is raised [zerr:XDTY0011].
    • Duplicate values (which may arise in the case of general indexes only) are eliminated from the atomized sequence.
  • If the index is a value index:
    • The result of each IndexKeySpec is a single atomic item or the empty sequence. We call this result a key value.
    • Let Di be the i-th domain node, and Kij be the key value computed for Di by the j-th IndexKeySpec (where the numbering of the IndexKeySpecs is done using their order of appearance in the index declaration). Let Ki be the tuple [Ki1, ..., KiM], where M is the number of IndexKeySpecs. The next step is to insert in the index a mapping from Ki to Di. This step is performed for each node in the domain sequence. The order in which the domain sequence is processed is implementation dependent.
    • If the index is declared as unique, the relationship between key tuples and domain nodes is one-to-one. In this case, if the index already contains an entry whose key tuple is equal to Ki, an error is raised [zerr:ZDDY0024]. Otherwise, the entry [Ki, Di] is inserted in the index container.
    • If the index is non-unique, then if it already contains an entry whose key tuple is equal to Ki, Di is added to the set associated with Ki. Otherwise, the entry [Ki, { Di }] is inserted in the index.
  • If the index is a general index:
    • In the current implementation, there can be only one IndexKeySpec, but contrary to value indexes, the result of this IndexKeySpec may be a sequence of any number of atomic items, and the items may have different data types. We call this sequence a key sequence, and each atomic item in it a key item (the key sequence may also be the empty sequence).
    • An error is raised [zerr:XDTY0012] if the index is a range index and any of the key items has a type that is not xs:untypedAtomic and for which no ordering relationship exists.
    • In this step, an expanded key sequence is constructed for each domain node. If the atomic type specified in the IndexKeyTypeDecl is neither xs:untypedAtomic nor xs:anyAtomicType, the expanded key sequence is the same as the original key sequence. Otherwise, let Di be the i-th domain node, and Ki be the key sequence computed for Di. If any key item in Ki has type xs:untypedAtomic, the item is removed from Ki and is cast to every primitive builtin data type. For each successful cast, the resulting item is inserted back into Ki.
    • The next step is to insert in the index a mapping from Kij to Di, for each key item Kij in the expanded key sequence Ki. This step is performed for each node in the domain sequence. The order in which the domain sequence is processed is implementation dependent.
    • If the index is declared as unique, the relationship between key items and domain nodes is one-to-one. In this case, if the index already contains an entry whose key item is equal to Kij, an error is raised [zerr:ZDDY0024]. Otherwise, the entry [Kij, Di] is inserted in the index container.
    • If the index is non-unique, then if it already contains an entry whose key K is equal to Kij, Di is added to the node set associated with K. Otherwise, the entry [Kij, { Di }] is inserted into the index.
    • The index maintains a "special" entry for all domain nodes whose key sequence is empty. All such nodes are inserted in the node set associated with this entry.
  • A entry is added to the indexes table. The entry maps the expanded QName of the index to the index entry container.

Index Deletion

To destroy an index, Zorba provides the delete updating function. The function itself and its associated update primitive are described below.
  declare updating function iddl:delete($indexName as xs :QName)

  upd:deleteIndex($indexName as xs:QName)
The delete function is evaluated as follows:
  • If the given expanded QName does not identify an index among the statically known indexes in the static context of the invoking module, an error is raised [zerr:ZDDY0021].
  • If the given expanded QName does not identify an index among the available indexes in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0023].
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:deleteIndex($indexName).
The update primitive is applied as follows:
  • The index entry container for the index is found via the indexes table.
  • All entries in the container are destroyed.
  • The container itself is destroyed.
  • The entry mapping the index name to the index entry container is removed from the indexes table.

Index Probing


Probing an index means retrieving the domain nodes associated with a particular search condition. Probing can be done via the xqddf functions idml:probe-index-point-value, idml:probe-index-point-general, idml:probe-index-range-value, or idml:probe-index-range-general. For each of these functions, the first argument is a QName identifying an index. The rest of the arguments specify the search condition. For all functions, the index must exist in both the statically known indexes and the available indexes of the invoking module; otherwise error zerr:ZDDY0021 or zerr:ZDDY0023 is raised, respectively. All of the functions return their result sorted in document order and without duplicate nodes.idml:probe-index-point-value
  idml:probe-index-point-value($indexUri as xs:QName,
                               $key1     as xs:anyAtomicType?,
                               ...,
                               $keyM     as xs:anyAtomicType?) as node()*
The probe-index-point-value function retrieves the domain nodes associated by value equality with a given search tuple. The search tuple consists of a number of search keys, where each search key is either an atomic item or the empty sequence. The result of this function is either an error or the set of domain nodes for which the following xquery expression returns true:
$key1 eq $node/keyExpr1 and ... and $keyM eq $node/keyExprN
where keyExpr-i is the expression specified in the ith keyspec of the index and N is the number of keyspecs for the index. Notice that this definition implies that if any of search keys is the empty sequence, the result of the probe is also the empty sequence.In addition to the errors that may be raised by the above expression, the probe-index-point-value function may raise the following errors:
  • zerr:ZDDY0021 or zerr:ZDDY0023, if the index is not among the statically known indexes or the available indexes.
  • [zerr:ZDDY0025], if the number of search keys is not equal to the number of keyspecs found in the index declaration.
  • [err:XPTY0004], if a non-empty search key is given, whose type does not match the sequence type specified in the corresponding keyspec.
idml:probe-index-point-general
  idml:probe-index-point-general($indexUri as xs:QName,
                                 $keys     as xs:anyAtomicType*) as node()*

The probe-index-point-general function retrieves the domain nodes associated by general equality with a given search sequence. The search sequence consists of an arbitrary number of search keys, where each search key is an atomic item. The function is supported by general indexes only [zerr:ZDDY0029]. Its result is either an error or the set of domain nodes for which the following xquery expression returns true:
$keys = $node/keyExpr
where keyExpr is the expression specified in the keyspec of the index (remember that for general indexes, there can be only one keyspec).In addition to the errors that may be raised by the above expression, the probe-index-point-value function may raise the following errors:
  • zerr:ZDDY0021 or zerr:ZDDY0023, if the index is not among the statically known indexes or the available indexes.
  • [zerr:ZDDY0029], if the index is not general.
  • [err:XPTY0004], if the search sequence contains a search key, whose type does not match the sequence type specified in the keyspec of the index.
idml:probe-index-range-value
  probe-index-range-value($indexUri            as xs:QName,
                          $lowerBound1         as xs:anyAtomicType?,
                          $upperBound1         as xs:anyAtomicType?,
                          $haveLowerBound1     as xs:boolean,
                          $haveUpperBound1     as xs:boolean,
                          $lowerBoundIncluded1 as xs:boolean,
                          $upperBoundIncluded1 as xs:boolean,
                          ....,
                          $lowerBoundM         as xs:anyAtomicType?,
                          $upperBoundM         as xs:anyAtomicType?,
                          $haveLowerBoundM     as xs:boolean,
                          $haveUpperBoundM     as xs:boolean,
                          $lowerBoundIncludedM as xs:boolean,
                          $upperBoundIncludedM as xs:boolean) as node()*
The probe-index-range-value function retrieves the domain nodes associated by value order-comparison (operators le, lt, ge, gt) with a given search box. The search box is specified as a number M of rangespecs , where each rangespec consists of six values. The number M must be greater than 0 and less than or equal to the number N of keyspecs found in the index declaration [zerr:ZDDY0025]. If M is less than N, then the "missing" rangespecs are assumed to have the following value: [(), (), false, false, false, false]. As a result, from now on, we can assume that M is equal to N (Remember that for general indexes, there can be only one IndexKeySpec, and as a result, for general indexes, M = N = 1).The ith rangespec corresponds to the ith keyspec, and specifies a search condition on the key values that are produced by evaluating that keyspec for every domain node. Specifically, we define the ith rangespec result as the set of domain nodes for which the following xquery expression returns true:
if ($haveLowerBound-i and $haveUpperBound-i) then
  $lowerBound-i lop $node/keyExpr-i and $node/keyExpr-i uop $upperBound-i
else if ($haveLowerBound-i) then
  $lowerBound-i lop $node/keyExpr-i
else if ($haveUpperBound-i) then
  $node/keyExpr-i uop $upperBound-i
else
  fn:true()
where keyExpr-i is the expression specified by the ith keyspec of the index, lop is either the le or the lt operator depending on whether $lowerBoundsIncluded-i is true or false, and uop is either the le or the lt operator depending on whether $upperBoundsIncluded-i is true or false.The result of the probe-index-range-value function is either an error, or the intersection of all the rangespec results. In addition to the errors that may be raised by a rangespec expression, the function may raise the following errors:
  • zerr:ZDDY0021 or zerr:ZDDY0023, if the index is not among the statically known indexes or the available indexes.
  • [zerr:ZDDY0026], if the index is not a range index.
  • [zerr:ZDDY0025], if the number of rangespecs passed as arguments is zero or greater than the number of keys declared for the index.
  • [err:XPTY0004], if $haveLowerBound-i is true and $lowerBound-i is an atomic item whose type does not match the sequence type specified by the ith keyspec, or $haveUpperBound-i is true and $upperBound-i is an atomic item whose type does not match the sequence type specified by the ith keyspec.
  • [zerr:ZDDY0034], if (a) the index is general (in which case there is only one rangespac), (b) the index is untyped, (c) there is both a lower and an upper bound, and (d) if T1 and T2 are the types of the lower and upper bound, neither T1 is a subtype of T2 nor T2 is a subtype of T1.
idml:probe-index-range-general
  probe-index-range-general($indexUri            as xs:QName,
                            $lowerBoundKeys      as xs:anyAtomicType*,
                            $upperBoundKeys      as xs:anyAtomicType*,
                            $haveLowerBound      as xs:boolean,
                            $haveUpperBound      as xs:boolean,
                            $lowerBoundIncluded  as xs:boolean,
                            $upperBoundIncluded  as xs:boolean) as node()*
The probe-index-range-general function retrieves the domain nodes associated by general order-comparison (operators <=, <, >=, >) with one or two search sequences. Each search sequence consists of an arbitrary number of search keys, where each search key is an atomic item. This method is supported by general range indexes only [zerr:ZDDY0030]. Its result is either an error or the set of domain nodes for which the following xquery expression returns true:
if ($haveLowerBound and $haveUpperBound) then
  $lowerBoundKeys lop $node/keyExpr and $node/keyExpr uop $upperBoundKeys
else if ($haveLowerBound) then
  $lowerBoundKeys lop $node/keyExpr
else if ($haveUpperBound) then
  $node/keyExpr uop $upperBoundKeys
else
  fn:true()
where keyExpr is the expression specified in the keyspec of the index, lop is either the <= or the < operator depending on whether $lowerBoundsIncluded is true or false, and uop is either the <= or the < operator depending on whether $upperBoundsIncluded is true or false.In addition to the errors that may be raised by the above expression, the probe-index-point-value function may raise the following errors:
  • zerr:ZDDY0021 or zerr:ZDDY0023, if the index is not among the statically known indexes or the available indexes.
  • [zerr:ZDDY0030], if the index is not a general range index.
  • [err:XPTY0004], if $haveLowerBound is true and $lowerBoundKeys constains an atomic item whose type does not match the sequence type specified by the index keyspec, or $haveUpperBound is true and $upperBoundKeys contains an atomic item whose type does not match the sequence type specified by the index keyspec.

Retrieving Index Keys

In addition to probing an index, the idml module also provides a function that allows listing all the keys contained in an index.
  idml:keys($indexName as xs:QName) as item()*
This function returns a sequence of element nodes. Each node in the sequence represents one key contained in the index and has the following structure:
 <key xmlns="http://zorba.io/modules/store/static/indexes/dml">
   <attribute value="key_1"/>
   ...
   <attribute value="key_n"/>
 </key>
The order of the attribute elements reflects the order of the key specifications in the declaration of the index. Also, the types of the values of the attributes are the types of the keys as they are declared. If a value attribute is not present, this means that the value of the corresponding key in the index is the empty sequence.

Index Maintenance


An index is said to be up-to-date if its content reflects the index definition on the current data snapshot, i.e., the contents are the same as those that would be produced if the iddl:create function was invoked on the same index and with the same underlying data. An index is said to be stale if it is not up-to-date. Indexes become stale when documents in collections are updated or when documents are inserted/removed in/from collections. Index Maintenance is the operation by which stale index contents are updated so that the index becomes up-to-date. Zorba offers two maintenance modes: manual and automatic.

If an index is declared as "automatically maintained" (i.e. 'an:automatic'), Zorba guarantees that every time a PUL is applied, the index is made up-to-date before the upd:apply-updates function returns. Ideally, all indexes should be automatically maintained, but in general, index maintenance can be a very expensive operation performance-wise. As a result, Zorba will reject a declaration for an automatic index if it determines that it cannot maintain the index in an "efficient" way. The definition of efficiency with respect to index maintenance is implementation dependent, but in general, it means that the index can be maintained in some incremental way that is faster than simply re-creating the whole index from scratch. However, even incremental maintenance can have a high cost, which may make the manual mode described below the preferred choice.

If an index is declared as "manually maintained" (i.e. 'an:manual'), it is the responsibility of the programmers to keep the index up-to-date. This can be done using the idml:refresh-index updating function described below. Since Zorba does not take any maintenance action during PUL applications, manually maintained indexes may become stale in between calls to the idml:refresh-index function. Obviously, the manual mode must be used if an index cannot be maintained automatically. However, even for automatically maintainable indexes, the manual mode may be preferable if users can tolerate a stale index in return for better performance during updates.
  declare updating function idml:refresh-index($indexName as xs:QName)

  upd:refreshIndex($indexName as xs:QName)
The refresh-index function is evaluated as follows:
  • If the given expanded QName does not identify an index among the statically known indexes in the static context of the invoking module, an error is raised [zerr:ZDDY0021].
  • If the given expanded QName does not identify an index among the available indexes in the dynamic context of the invoking module, an error is raised [zerr:ZDDY0023].
  • The result of the function is an empty XDM instance and a pending update list that consists of a single update primitive: upd:refreshIndex($indexName).
The update primitive is applied as follows:
  • The index entry container for the index is found via the indexes table.
  • The container is made up-to-date in some implementation dependent way. In Zorba this is done by discarding the current contents and rebuilding the index from scratch (the same way as the iddl:create function populates an empty index container).

Integrity Constraints in Zorba

  ICDecl                ::=  'integrity' 'constraint' EQName (ICCollection | ICForeignKey)

  ICCollection          ::=  'on' 'collection' EQName
       ( ICCollSequence | ICCollSequenceUnique | ICCollNode )

  ICCollSequence        ::=  '\$' EQName 'check' ExprSingle

  ICCollSequenceUnique  ::=  'node' '\$' EQName 'check' 'unique' 'key' PathExpr

  ICCollNode            ::=  'foreach' 'node' '\$' EQName 'check' ExprSingle

  ICForeignKey          ::=  'foreign' 'key' ICForeignKeySource ICForeignKeyTarget

  ICForeignKeySource    ::=  'from' ICForeignKeyValues

  ICForeignKeyTarget    ::=  'to' ICForeignKeyValues

  ICForeignKeyValues    ::=  'collection' EQName 'node' '\$' QName 'key' PathExpr
Analogously to collections and indexes, Zorba defines an additional extension to XQuery library modules which allows the declaration of (static) integrity constraints (ICs). Static ICs can be used to ensure that, in every moment in time, all data which is stored in collections is accurate and consistent according to the semantics of an application. Note that Zorba doesn't define any dynamic integrity constraints which check the validity of a particular update. As in the relational world, Zorba defines several types of ICs: Entity, Domain and Referential ICs.

Entity ICs check for the accuracy and consistency of all nodes in a collection. For instance, a special case of the Entity IC is the IC that checks for unique keys among all nodes in a collection. The Domain IC validates that each node in a collection satisfies a given expression. The Referential IC is used to ensure a foreign key relationship between the nodes in two collections.

In this section, we describe how such ICs are declared in a library module and how a particular IC can be (de-)activated. All ICs are described using examples for the news application. Specifically, we declare ICs for the data stored in the news-data:employees and the news-data:articles collections.

Declaration


As for collections and indexes, ICs must be declared before the user can activate them. An IC declaration specifies (1) the name of the IC for being used by function call to (de-)activate it (see next section), (2) the name of the collection(s) whose data should be validated, and (3) the expression(s) that guarantee the accuracy and consistency of the data. Analogously to indexes, ICs are declared inside the prolog of the library module that declares the collection(s) which is/are referenced by the IC.

Entity Integrity

An Entity IC is used to state the uniqueness of a key among all nodes of a collection. For example, the IC (named news-data:UniqueId) in the example below states that the value of the id attribute of each employee is unique among all other nodes in the news-data:employees collection.
  declare integrity constraint news-data:UniqueId
    on collection news-data:employees
    node $id check unique key $id/@id;
The name of the collection is specified after the "on collection" keyword. The path expression following the "check unique key" keyword returns the value to be checked for uniqueness. The result of this path expression must not be empty and is wrapped to return an atomic value. The variable $id is successively bound to each node of the news-data:employees collection and available in the check expression.

Domain Integrity

The Domain IC allows the user to specify constraints that a particular node in a collection must satisfy. Domain ICs can be use in addition to XML Schema types or if no XML schema is available.With the following example, we want to make sure that the name of each author of an article is not the zero length string. This can be particularly useful since there is no XML schema for articles.
  declare integrity constraint news-data:AuthorNames
    on collection news-data:article
    foreach node $article check fn:string-length($article/author/name) != 0;
The name of the IC is news-data:AuthorNames and it is defined on nodes belonging to the news-data:articles collection. The "foreach node" expression specifies a variable (using a QName) which is bound to each node in the collection. For each such node, the check expression is executed. For each node, the boolean effective value of the result of this expression must be equal to true.

Referential Integrity


The Referential IC requires every value of a node in a collection to exist as a value of another node in another collection. For example, in the database of the news organization, we want to make sure that each article is maintained by an (existing) employee. This can be done by declaring a so called foreign key IC. In the following example, this IC is given the name news-data:ArticleEmployees.
  declare integrity constraint news-data:ArticleEmployees
    foreign key
      from collection news-data:articles node $x key $x/empid
      to   collection news-data:employees node $y key fn:data($y/@id);
The QName following the "from collection" and "to collection" keywords specify the source and destination collections, respectively. Each result of the key expressions are wrapped to return an atomic value. For each atomic value in the source collection, an atomic value in the sequence returned by the key expression on the destination collection must exist. The IC is violated if this is not the case for any node in the source collection. This semantics is equivalent to the following XQuery expression.
  every $x in idml:collection(xs:QName("news-data:articles"))
  satisfies
    some $y in idml:collection(xs:QName("news-data:employees"))
    satisfies $y/id eq $x//sale/empid

Lifecycle Management


ICs can be checked manually (if requested by the user) or automatically on updates apply time, after validation and indexes are computed. In order to be checked automatically, an IC needs to be active. ICs can be (de-)activated using the two updating functions icddl:activate and icddl:deactivate, respectively. Each function takes the name of the IC to (de-)activate as parameter. The flag indicating whether an IC is active or not is stored in the dynamic context.

Deactivating an IC might be useful if the corresponding check is expensive and, hence, inconsistency of the data might be acceptable and only checked (and fixed manually) from time to time. To check an IC manually, the Zorba defines an updating function called check-integrity-constraint which triggers the IC, identified by a QName passed as parameter, to be checked.

Similar to collections and indexes, the module declaring the integrity constraints (i.e. with namespace http://www.news.org/data) can also declare variables whose values are the QNames of the ICs. This allows their names to be easily referenced by subsequent expressions. For example, such a variable can be passed as a parameter to the activate in the importing admin-script module (see above). For the ICs from the section above, those variables are declared as follows:
  declare variable $news-data:UniqueId := xs:QName("news-data:UniqueId");
  declare variable $news-data:AuthorName := xs:QName("news-data:AuthorNames");
  declare variable $news-data:ArticleEmployees := xs:QName("news-data:ArticleEmployees");

Extensions to the XQUF updates

routines

upd:mergeUpdates


The XQuery Update Facility specification lists a number of errors that may be raised by the upd:mergeUpdates routine. Zorba adds the following error conditions to this list:
  • An error is raised [zerr:ZDDY0016] if two or more upd:createCollection primitives having the same QName as argument appear in the merged list.
  • An error is raised [zerr:ZDDY0027] if two or more upd:createIndex primitives having the same QName as argument appear in the merged list.

upd:applyUpdates

Appendix Error Summary

The prefix "zerr" is bound to the error namespace of Zorba, i.e. http://zorba.io/errors . The prefix "err" is bound to the general XQuery error namespace, i.e. http://www.w3.org/2005/xqt-errors .XQuery Data Definition Facility - Static Errors:
zerr:ZDST0001, A collection with name is already declared in the same module.

zerr:ZDST0002, A collection with name that is imported from module is already declared in the importing module.

zerr:ZDST0003, A collection declaration cannot appear in a main module.

zerr:ZDST0004, The declaration for collection specifies multiple values for the same property.

zerr:ZDST0005, The declaration for collection specifies conflicting property values.

zerr:ZDST0006, The declaration for collection contains an invalid property value.

zerr:ZDST0021, An index with name is already declared in the same module.

zerr:ZDST0022, An index with name that is imported from module /s is already declared in the importing module.

zerr:ZDST0023, An index declaration cannot appear in a main module.

zerr:ZDST0024, The declaration for index specifies multiple values for the same property.

zerr:ZDST0025, The index cannot be declared as unique.

zerr:ZDST0026, The declaration for index contains an invalid property value.

zerr:ZDST0027, The index has an invalid key type declaration.

zerr:ZDST0028, The index has a non deterministic definition.

zerr:ZDST0029, The index references a data source that is not a collection.

zerr:ZDST0030, The index references a collection with a non-const QName.

zerr:ZDST0031, The index has free variables in its definition.

zerr:ZDST0032, The domain expression of index references the context item.

zerr:ZDST0033, The declaration of index contains a non-simple expression.

zerr:ZDST0034, Index cannot be automatically maintained.

zerr:ZDST0036, Index qname does not have the same namespace URI as the target namespace of the declaring module.

zerr:ZDST0041, An integrity constraint with URI is declared already.

zerr:ZDST0042, An integrity constraint key has multiple values.

zerr:ZDST0043, An integrity constraint key has a non-atomic value.

zerr:ZDST0045, The integrity constraint cannot be declared in a main module.

zerr:ZDST0046, The integrity constraint with URI has free variables in its definition.

zerr:ZDST0047, The integrity constraint with URI references a data source that is not a collection among the statically known collections.

zerr:ZDST0048,The integrity constraint with URI has a non deterministic definition.
XQuery Data Definition Facility - Type Errors:
zerr:XDTY0001, Collection cannot contain a node of certain type.

zerr:XDTY0010, The domain expression of index produces an item that is not a node.

zerr:XDTY0011, The result of some key expression of index does not match its declared type.
XQuery Data Definition Facility - Dynamic Errors:
zerr:ZDDY0001, Collection is not declared in the static context.

zerr:ZDDY0002, Collection exists already.

zerr:ZDDY0003, Collection does not exist.

zerr:ZDDY0004, Cannot update const collection.

zerr:ZDDY0005, Illegal insert in append-only collection.

zerr:ZDDY0006, Illegal insert in queue collection.

zerr:ZDDY0007, Illegal delete from append-only collection.

zerr:ZDDY0008, Illegal delete from queue collection.

zerr:ZDDY0009, Not all the nodes to delete are at the beginning of queue collection.

zerr:ZDDY0010, Illegal update of node in collection, whose nodes are read-only.

zerr:ZDDY0011, Node is not contained in collection.

zerr:ZDDY0012, Illegal insert in unordered collection.

zerr:ZDDY0013, Cannot delete collection because there are indexes that reference it.

zerr:ZDDY0014, Cannot delete collection because there are integrity constraints that reference it.

zerr:ZDDY0015, Cannot delete collection because there are references on its nodes.
zerr:ZDDY0016, Cannot invoke the create function multiple times with the same QName in the same snapshot.
zerr:ZDDY0020, The domain expression of index produces nodes that are not in collection.

zerr:ZDDY0021, Index is not declared in the static context.

zerr:ZDDY0022, Index exists already.

zerr:ZDDY0023, Index does not exist.

zerr:ZDDY0024, The uniqueness property of index is violated.

zerr:ZDDY0025, Invalid number of arguments in probe of index.

zerr:ZDDY0026, Index does not support range probes.
zerr:ZDDY0027, Cannot invoke the create function multiple times with the same QName in the same snapshot.zerr:ZDDY0030, Index does not support general range probes.
zerr:ZDDY0031, An integrity constraint with URI is not declared.

zerr:ZDDY0032, An integrity constraint with URI is not declared.

zerr:ZDDY0033, Conditions for integrity constraint were not met on collection.

zerr:ZDDY0034, Index range-value probe has search keys with incompatible types