XQuery Function Documentation

Search and Browse

http://exist-db.org/xquery/text

java:org.exist.xquery.functions.text.TextModule

A module for text searching extension functions.

text:filter

text:filter($text as xs:string, $regularexpression as xs:string) as xs:string*

Filter substrings that match the regular expression in the text.

Parameters:
$text The text to filter
$regularexpression The regular expression to perform against the text
Returns:
xs:string* : the substrings

text:filter-nested

text:filter-nested($node-set as node()*) as node()*

Filters out all nodes in the node set, which do have descendant nodes in the same node set. This is useful if you do a combined query like //(a|b)[. &= $terms] and some 'b' nodes are nested within 'a' nodes, but you only want to see the innermost matches, i.e. the 'b' nodes, not the 'a' nodes containing 'b' nodes.

Parameters:
$node-set* The node set
Returns:
node()* : a node set containing nodes that do not have descendent nodes.

text:fuzzy-index-terms

text:fuzzy-index-terms($term as xs:string?) as xs:string*

Compares the specified argument against the contents of the fulltext index. Returns a sequence of strings which are similar to the argument. Similarity is based on Levenshtein distance. This function may not be useful in its current form and is subject to change.

Parameters:
$term? The term
Returns:
xs:string* : a sequence of strings which are similar to the argument $term

text:fuzzy-match-all

text:fuzzy-match-all($source as node()*, $keyword as xs:string) as node()*

Fuzzy keyword search, which compares strings based on the Levenshtein distance (or edit distance). The function tries to match each of the keywords specified in the keyword string against the string value of each item in the sequence $source.

Parameters:
$source* The source
$keyword The keyword string
Returns:
node()* : the sequence of nodes that match the keywords

text:fuzzy-match-any

text:fuzzy-match-any($source as node()*, $keyword as xs:string) as node()*

Fuzzy keyword search, which compares strings based on the Levenshtein distance (or edit distance). The function tries to match any of the keywords specified in the keyword string against the string value of each item in the sequence $source.

Parameters:
$source* The source
$keyword The keyword string
Returns:
node()* : the sequence of nodes that match the keywords

text:groups

text:groups($text as xs:string, $regularexpression as xs:string) as xs:string*

Tries to match the string in $text to the regular expression. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

Parameters:
$text The text to filter
$regularexpression The regular expression to perform against the text
Returns:
xs:string* : an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:groups

text:groups($text as xs:string, $regularexpression as xs:string, $flags as xs:string) as xs:string*

Tries to match the string in $text to the regular expression, using the flags specified. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

Parameters:
$text The text to filter
$regularexpression The regular expression to perform against the text
$flags The flags
Returns:
xs:string* : an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:groups-regex

text:groups-regex($text as xs:string, $regularexpression as xs:string) as xs:string*

Tries to match the string in $text to the regular expression. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups. Note:
The groups-regex() variants of the groups() functions are identical except that they avoid the translation of the specified regular expression from XPath2 to Java syntax. That is, the regular expression is evaluated as is, and must be valid according to Java regular expression syntax, rather than the more restrictive XPath2 syntax.

Parameters:
$text The text to filter
$regularexpression The regular expression to perform against the text
Returns:
xs:string* : an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:groups-regex

text:groups-regex($text as xs:string, $regularexpression as xs:string, $flags as xs:string) as xs:string*

Tries to match the string in $text to the regular expression, using the flags specified. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups. Note:
The groups-regex() variants of the groups() functions are identical except that they avoid the translation of the specified regular expression from XPath2 to Java syntax. That is, the regular expression is evaluated as is, and must be valid according to Java regular expression syntax, rather than the more restrictive XPath2 syntax.

Parameters:
$text The text to filter
$regularexpression The regular expression to perform against the text
$flags The flags
Returns:
xs:string* : an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:highlight-matches

text:highlight-matches($source as text()*, $callback-function-ref as function, $parameters as item()*) as node()*

Highlight matching strings within text nodes that resulted from a fulltext search. When searching with one of the fulltext operators or functions, eXist keeps track of the fulltext matches within the text. Usually, the serializer will mark those matches by enclosing them into an 'exist:match' element. One can then use an XSLT stylesheet to replace those match elements and highlight matches to the user. However, this is not always possible, so Instead of using an XSLT to post-process the serialized output, the highlight-matches function provides direct access to the matching portions of the text within XQuery. The function takes a sequence of text nodes as first argument $source and a callback function (defined with util:function) as second parameter. $parameters may contain a sequence of additional values that will be passed to the callback functions third parameter. Text nodes without matches will be returned as they are. However, if the text contains a match marker, the matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>.

Parameters:
$source* The sequence of text nodes
$callback-function-ref The callback function (defined with util:function)
$parameters* The sequence of additional values that will be passed to the callback functions third parameter.
Returns:
node()* : the source with the added highlights

text:index-terms

text:index-terms($nodes as node()*, $start as xs:string?, $function as function, $returnMax as xs:int) as item()*

This function can be used to collect some information on the distribution of index terms within a set of nodes. The set of nodes is specified in the first argument $nodes. The function returns term frequencies for all terms in the index found in descendants of the nodes in $nodes. The second argument $start specifies a start string. Only terms starting with the specified character sequence are returned. If $nodes is the empty sequence, all terms in the index will be selected. $function is a function reference, which points to a callback function that will be called for every term occurrence. $returnMax defines the maximum number of terms that should be reported. The function reference for $function can be created with the util:function function. It can be an arbitrary user-defined function, but it should take exactly 2 arguments: 1) the current term as found in the index as xs:string, 2) a sequence containing four int values: a) the overall frequency of the term within the node set, b) the number of distinct documents in the node set the term occurs in, c) the current position of the term in the whole list of terms returned, d) the rank of the current term in the whole list of terms returned.

Parameters:
$nodes* The set of nodes in which the returned tokens occur
$start? The optional start string
$function The callback function reference
$returnMax The maximum number of terms to report
Returns:
item()* : the results from the evaluation of the function reference

text:index-terms

text:index-terms($nodes as node()*, $qnames as xs:QName+, $start as xs:string?, $function as function, $returnMax as xs:int) as item()*

This version of the index-terms function is to be used with indexes that were defined on a specific element or attribute QName. The second argument lists the QNames or elements or attributes for which occurrences should bereturned. Otherwise, the function behaves like the 4-argument version.

Parameters:
$nodes* The set of nodes in which the returned tokens occur
$qnames+ One or more element or attribute names for which index terms are returned
$start? The optional start string
$function The callback function reference
$returnMax The maximum number of terms to report
Returns:
item()* : the results from the evaluation of the function reference

text:kwic-display

text:kwic-display($text as text()*, $width as xs:positiveInteger, $callback-function as function, $parameters as item()*) as node()*

Deprecated: kwic functionality is now provided by an XQuery module, see http://exist-org/kwic.html.This function takes a sequence of text nodes in $a, containing matches from a fulltext search. It highlights matching strings within those text nodes in the same way as the text:highlight-matches function. However, only a defined portion of the text surrounding the first match (and maybe following matches) is returned. If the text preceding the first match is larger than the width specified in the second argument $b, it will be truncated to fill no more than (width - keyword-length) / 2 characters. Likewise, the text following the match will be truncated in such a way that the whole string sequence fits into width characters. The third parameter $c is a callback function (defined with util:function). $d may contain an additional sequence of values that will be passed to the last parameter of the callback function. Any matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>. The callback function should take 3 or 4 arguments: 1) the text sequence corresponding to the match as xs:string, 2) the text node to which this match belongs, 3) the sequence passed as last argument to kwic-display. If the callback function accepts 4 arguments, the last argument will contain additional information on the match as a sequence of 4 integers: a) the number of the match if there's more than one match in a text node - the first match will be numbered 1; b) the offset of the match into the original text node string; c) the length of the match as reported by the index.

Parameters:
$text* The text nodes
$width The width
$callback-function The callback function
$parameters* The parameters passed into the last argument of the callback function
Returns:
node()* : the results
Deprecated:
Improved kwic functionality is now provided by a separate XQuery module, see http://www.exist-db.org/exist/apps/doc/kwic.xml. This function could be removed in the next major release version.

text:kwic-display

text:kwic-display($text as text()*, $width as xs:positiveInteger, $callback-function as function, $result-callback as function, $parameters as item()*) as node()*

This function takes a sequence of text nodes in $a, containing matches from a fulltext search. It highlights matching strings within those text nodes in the same way as the text:highlight-matches function. However, only a defined portion of the text surrounding the first match (and maybe following matches) is returned. If the text preceding the first match is larger than the width specified in the second argument $b, it will be truncated to fill no more than (width - keyword-length) / 2 characters. Likewise, the text following the match will be truncated in such a way that the whole string sequence fits into width characters. The third parameter $c is a callback function (defined with util:function). $d may contain an additional sequence of values that will be passed to the last parameter of the callback function. Any matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>. The callback function should take 3 or 4 arguments: 1) the text sequence corresponding to the match as xs:string, 2) the text node to which this match belongs, 3) the sequence passed as last argument to kwic-display. If the callback function accepts 4 arguments, the last argument will contain additional information on the match as a sequence of 4 integers: a) the number of the match if there's more than one match in a text node - the first match will be numbered 1; b) the offset of the match into the original text node string; c) the length of the match as reported by the index.

Parameters:
$text* The text nodes
$width The width
$callback-function The callback function
$result-callback The result callback function
$parameters* The parameters passed into the last argument of the callback function
Returns:
node()* : the results
Deprecated:
Improved kwic functionality is now provided by a separate XQuery module, see http://www.exist-db.org/exist/apps/doc/kwic.xml. This function could be removed in the next major release version.

text:make-token

text:make-token($text as xs:string) as xs:string*

Split a string into tokens

Parameters:
$text The string to tokenize
Returns:
xs:string* : a sequence of tokens

text:match-all

text:match-all($source as node()*, $regular-expression as xs:string+) as node()*

Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ALL of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Parameters:
$source* The node set that is to be searched for the keyword set
$regular-expression+ The regular expressions to be matched against the fulltext index
Returns:
node()* : the sequence of all of the matching nodes

text:match-all

text:match-all($source as node()*, $regular-expression as xs:string+, $flag as xs:string) as node()*

Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ALL of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Parameters:
$source* The node set that is to be searched for the keyword set
$regular-expression+ The regular expressions to be matched against the fulltext index
$flag With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
Returns:
node()* : the sequence of all of the matching nodes

text:match-any

text:match-any($source as node()*, $regular-expression as xs:string+) as node()*

Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ANY of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Parameters:
$source* The node set that is to be searched for the keyword set
$regular-expression+ The regular expressions to be matched against the fulltext index
Returns:
node()* : the sequence of all of the matching nodes

text:match-any

text:match-any($source as node()*, $regular-expression as xs:string+, $flag as xs:string) as node()*

Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ANY of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Parameters:
$source* The node set that is to be searched for the keyword set
$regular-expression+ The regular expressions to be matched against the fulltext index
$flag With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
Returns:
node()* : the sequence of all of the matching nodes

text:match-count

text:match-count($source as node()?) as xs:integer

Counts the number of fulltext matches within the nodes and subnodes in $source.

Parameters:
$source? The node and subnodes to do the fulltext match on
Returns:
xs:integer : the count

text:matches-regex

text:matches-regex($input as xs:string*, $pattern as xs:string) as xs:boolean

The function returns true if $input matches the regular expression supplied as $pattern, if present; otherwise, it returns false.
If $input is the empty sequence, it is interpreted as the zero-length string.
Note:
The text:matches-regex() variants of the fn:matches() functions are identical except that they avoid the translation of the specified regular expression from XPath2 to Java syntax. That is, the regular expression is evaluated as is, and must be valid according to Java regular expression syntax, rather than the more restrictive XPath2 syntax.

Parameters:
$input* The input string
$pattern The pattern
Returns:
xs:boolean : true if the pattern is a match, false otherwise

text:matches-regex

text:matches-regex($input as xs:string*, $pattern as xs:string, $flags as xs:string) as xs:boolean

The function returns true if $input matches the regular expression supplied as $pattern as influenced by the value of $flags, if present; otherwise, it returns false.
The effect of calling this version of the function with the $flags argument set to a zero-length string is the same as using the other two argument version. Flags are defined in 7.6.1.1 Flags.
If $input is the empty sequence, it is interpreted as the zero-length string.
Note:
The text:matches-regex() variants of the fn:matches() functions are identical except that they avoid the translation of the specified regular expression from XPath2 to Java syntax. That is, the regular expression is evaluated as is, and must be valid according to Java regular expression syntax, rather than the more restrictive XPath2 syntax.An error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 7.6.1 Regular Expression Syntax.

Parameters:
$input* The input string
$pattern The pattern
$flags The flags
Returns:
xs:boolean : true if the pattern is a match, false otherwise

text:text-rank

text:text-rank($text as node()?) as xs:double

This is just a skeleton for a possible ranking function. Don't use this.

Parameters:
$text? The text to rank
Returns:
xs:double : the ranking of the text