public class TermCompletionAnalyzer extends org.apache.lucene.analysis.Analyzer
This analyzer generates several index terms for each word in the input. These are intended to match short sequences (e.g. three or more) characters of user-input, to then give the user a drop-down list of matching terms.
This can be set up to address issues like matching
half-time when the user types
tim or if the user types
halft (treating the hyphen as a soft hyphen); or
TermCompletionAnalyzer when the user types
In contrast, the Lucene Analyzers are mainly geared around the free text search use case.
The intended use cases will typical involve a prefix query of the form:
?t bds:search "prefix*" .to find all literals in the selected graphs, which are indexed by a term starting in
prefix, so the problem this class addresses is finding the appropriate index terms to allow matching, at sensible points, mid-way through words (such as at hyphens).
To get maximum effectiveness it maybe best to use private language subtags (see RFC 5647),
which are mapped to this class by
the data being loaded into the store, and linked to some very simple process
KeywordAnalyzer for queries which are tagged with a different language tag
that is only used for
The above prefix query then becomes:
?t bds:search "prefix*"@x-query .
|Constructor and Description|
Divide the input into words, separated by the wordBoundary, and return a token for each whole word, and then generate further tokens for each word by removing prefixes up to and including each successive match of subWordBoundary
Divide the input into words and short tokens as with
|Modifier and Type||Method and Description|
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, setVersion, tokenStream, tokenStream
public TermCompletionAnalyzer(Pattern wordBoundary, Pattern subWordBoundary, Pattern softHyphens, boolean alwaysRemoveSoftHypens)
TermCompletionAnalyzer(Pattern, Pattern). Each term is generated, and then an additional term is generated with softHypens (defined by the pattern), removed. If the alwaysRemoveSoftHypens flag is true, then the first term (before the removal) is suppressed.
wordBoundary- The definition of space (e.g. " ")
subWordBoundary- Also index after matches to this (e.g. "-")
softHyphens- Discard these characters from matches
alwaysRemoveSoftHypens- If false the discard step is optional.
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.