zorba::Tokenizer::Properties

#include <zorba/tokenizer.h>

Various properties of this Tokenizer.

Public Types

std::vector< locale::iso639_1::type >

languages_type

Public Attributes

bool

comments_separate_tokens

If true, XML comments separate tokens.

bool

elements_separate_tokens

If true, XML elements separate tokens.

languages_type

languages

The set of languages supported.

bool

processing_instructions_separate_tokens

If true, XML processing instructions separate tokens.

char const *

uri

The URI that uniquely identifies this Tokenizer.

Public Types

languages_type

std::vector< locale::iso639_1::type > languages_type

Public Attributes

comments_separate_tokens

bool comments_separate_tokens

If true, XML comments separate tokens.

For example, net<!---->work would be 2 tokens instead of 1.

elements_separate_tokens

bool elements_separate_tokens

If true, XML elements separate tokens.

For example, <b>B</b>old would be 2 tokens instead of 1.

languages

languages_type languages

The set of languages supported.

processing_instructions_separate_tokens

bool processing_instructions_separate_tokens

If true, XML processing instructions separate tokens.

For example, net<?PI pi?>work would be 2 tokens instead of 1.

uri

char const * uri

The URI that uniquely identifies this Tokenizer.