TermCompletionAnalyzer (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - com.bigdata.search.TermCompletionAnalyzer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class TermCompletionAnalyzer
extends org.apache.lucene.analysis.Analyzer
```
An analyzer intended for the term-completion use case; particularly for technical vocabularies and concept schemes.
This analyzer generates several index terms for each word in the input. These are intended to match short sequences (e.g. three or more) characters of user-input, to then give the user a drop-down list of matching terms.
This can be set up to address issues like matching half-time when the user types tim or if the user types halft (treating the hyphen as a soft hyphen); or to match TermCompletionAnalyzer when the user types Ana
In contrast, the Lucene Analyzers are mainly geared around the free text search use case.
The intended use cases will typical involve a prefix query of the form:
```
    ?t bds:search "prefix*" .
 
```
to find all literals in the selected graphs, which are indexed by a term starting in prefix, so the problem this class addresses is finding the appropriate index terms to allow matching, at sensible points, mid-way through words (such as at hyphens).
To get maximum effectiveness it maybe best to use private language subtags (see RFC 5647), e.g. "x-term" which are mapped to this class by ConfigurableAnalyzerFactory for the data being loaded into the store, and linked to some very simple process like KeywordAnalyzer for queries which are tagged with a different language tag that is only used for bds:search, e.g. "x-query". The above prefix query then becomes:
```
    ?t bds:search "prefix*"@x-query .
 
```
Author:

jeremycarroll

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Analyzer
  GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY

Constructor Summary

Constructors
Constructor and Description
`TermCompletionAnalyzer(Pattern wordBoundary, Pattern subWordBoundary)` Divide the input into words, separated by the wordBoundary, and return a token for each whole word, and then generate further tokens for each word by removing prefixes up to and including each successive match of subWordBoundary
`TermCompletionAnalyzer(Pattern wordBoundary, Pattern subWordBoundary, Pattern softHyphens, boolean alwaysRemoveSoftHypens)` Divide the input into words and short tokens as with `TermCompletionAnalyzer(Pattern, Pattern)`.

Method Summary

Methods
Modifier and Type Method and Description

protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
- Methods inherited from class org.apache.lucene.analysis.Analyzer
  close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, setVersion, tokenStream, tokenStream
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods
Modifier and Type	Method and Description
`protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents`	`createComponents(String fieldName)`

- Constructor Detail
  - TermCompletionAnalyzer
```
public TermCompletionAnalyzer(Pattern wordBoundary,
                      Pattern subWordBoundary,
                      Pattern softHyphens,
                      boolean alwaysRemoveSoftHypens)
```
    Divide the input into words and short tokens as with TermCompletionAnalyzer(Pattern, Pattern). Each term is generated, and then an additional term is generated with softHypens (defined by the pattern), removed. If the alwaysRemoveSoftHypens flag is true, then the first term (before the removal) is suppressed.
    
    Parameters:
    wordBoundary - The definition of space (e.g. " ")
    subWordBoundary - Also index after matches to this (e.g. "-")
    softHyphens - Discard these characters from matches
    alwaysRemoveSoftHypens - If false the discard step is optional.
  - TermCompletionAnalyzer
```
public TermCompletionAnalyzer(Pattern wordBoundary,
                      Pattern subWordBoundary)
```
    Divide the input into words, separated by the wordBoundary, and return a token for each whole word, and then generate further tokens for each word by removing prefixes up to and including each successive match of subWordBoundary
    
    Parameters:
    wordBoundary -
    subWordBoundary -
- Method Detail
  - createComponents
```
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
```
    Specified by:
    
    createComponents in class org.apache.lucene.analysis.Analyzer

Class TermCompletionAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Constructor Detail

TermCompletionAnalyzer

TermCompletionAnalyzer

Method Detail

createComponents