Writing Your Own External Functions

External Functions

In XQuery, a module may declare both user-defined and external functions (see http://www.w3.org/TR/xquery-30/#FunctionDeclns). User-defined functions are implemented as XQuery expressions and their implementation is provided together with their declaration. In contrast, external functions are typically implemented in a host language other than XQuery, and their implementation is not inside the declaring module. As a result, to support external functions, XQuery processors must provide mechanisms by which (1) the implementation of an external function can be located, (2) values for function parameters are passed from the XQuery runtime environment to the host language, and (3) the result of the function is passed back from the host language to the XQuery runtime environment. We call step (1) external function resolution. Steps (2) and (3) are part of external function invocation.In Zorba, the C++ API provides the "glue" between the XQuery processor and the hosting environment into which external functions are implemented. Related code examples can be found here.

Implementation

In Zorba, external functions must be implemented as instances of the ExternalFunction class. We refer to such instances as external function objects. During its evaluation, an external function may or may not need to access the static or dynamic contexts of the invoking XQuery module. If the function implementation does need to access either context, the function is referred to as contextual; otherwise, it is non-contextual. Zorba provides classes ContextualExternalFunction and NonContextualExternalFunction to differentiate between contextual and non-contextual external functions, respectively. Both are abstract subclasses of ExternalFunction and provide a (virtual) evaluate() method that serves as the implementation of an external function. For each external function, an application must provide a concrete subclass of either of these classes.

Invocation

Invoking an external function boils down to invoking the evaluate() method on the associated function object. The first parameter of evaluate() is a vector of pointers to ItemSequence objects. During invocation, the given vector will contain one entry for each parameter listed in the external function declaration. The Zorba XQuery processor makes sure that the types of the item sequences given to evaluate() match the types of the formal parameters. Similarly, evaluate() returns the result of the function as an ItemSequence. Again, Zorba makes sure that the type of the returned ItemSequence matches the declared return type of the function. If the function is contextual, its evaluate() method has two additional parameters; they are pointers to the static and dynamic contextes of the module declaring the function.

Resolution

Before it can invoke an external function, Zorba must locate its function object. For this, Zorba looks-up the function object in the static context, using the function QName as the key. However, it is the responsibility of the application to register the function objects to the static context. For this, Zorba provides the ExternalModule abstract class. ExternalModule represents a group of external functions, all belonging to the same XQuery module (and thus having the same target namespace). It provides the interface for retrieving the function object of each contained external function given the function's QName. Applications are responsible for implementing concrete subclasses of ExternalModule and for registering instances of such subclasses into the static context. Registration can be done in either of the two following ways.1. An ExternalModule object can be registered explicitly using the registerModule() method of StaticContext. In this case, the application retains memory ownership of the ExternalModul object, and must free it when it is no longer needed.2. An implementation of ExternalModule and its associated ExternalFunctions can be packaged into a dynamic library (dll, so, or dylib). The dynamic library must have a global function named createModule() as an entry point, and createModule() must return a pointer to an ExternalModule object. Assuming a library has been named and placed appropriately, Zorba will automatically load it when it compiles the XQuery module with the same target namespace. It will call getModule() to create an ExternalModule object, and register it into the static context. In this case, Zorba assumes ownership of the ExternalModule object, and will destroy it when the containing static context is destroyed.Under most circumstances, a module author using this second technique will use the CMake-based DECLARE_ZORBA_MODULE() framework to build the external module code and install it appropriately. Details of this mechanism can be found here: External Functions in C++. Thus, you do not need to worry about how to build, name, or locate the shared object, as Zorba's process will take care of that for you.

Manual Shared Object Resolution

For reference, included below is the process that Zorba will use internally to located the shared object when compiling a module with external functions. If for some reason you do not wish to use the DECLARE_ZORBA_MODULE() framework, the below information will be necessary for you to place the shared object such that Zorba can load it at runtime.To locate a dynamic library, Zorba first transforms its target namespace URI to a relative file path and then uses the "Library Path" mechanism (described in Zorba's Library Path) to turn this relative path to the absolute path name of the dynamic library file. The transformation of the URI to a relative path is done using the following steps. In describing the steps, we will use the URI "http://www.example.com/modules/utils" as an example and assume we are working with a Linux system.
  1. The domain component of the URI is extracted and transformed into a path notation by replacing its "." characters (if any) into forward slashes and reversing the order of the path steps. The result of this step on the example URI is "com/example/www".
  2. The path component of the URI is extracted and separated into a branch name and a file name: (a) if the path component does not contain any "/" characters, the branch name is empty and the file name is the full path component, else (b) if the path component ends with a "/", the branch name is the full path component and the file name is empty, else (c) the file name is set to the last step of the path component (the substring after the last "/") and the branch name is set to the path component minus the last step. The branch name is then appended to the result of the previous step. The result of this step on the example URI is "com/example/www/modules/".
  3. On Unix or Mac systems, the string "lib" is appended to the result of the previous step. The result of this step on the example URI is "com/example/www/modules/lib".
  4. The file name is appended to the result of the previous step. The result of this step on the example URI is "com/example/www/modules/libutils".
  5. If the XQuery module being imported contains a version option, the character "_" followed by the version string are appended. Assuming that the module being imported via the example URI has a version option with the value "1.2", the result of this step on the example URI is "com/example/www/modules/libutils_1.2".
  6. Finally, the appropriate suffix is appended to the file name: ".so" for Unix, ".dll" for Windows, or ".dylib" for MacOS. The result of this step on the example URI is "com/example/www/modules/libutils_1.2.so".