public class ConfigurableAnalyzerFactory extends Object implements IAnalyzerFactory
Analyzers are used for which languages. Languages are specified by the language tag on RDF literals, which conform with RFC 5646. Within bigdata plain literals are assigned to the default locale's language. The bigdata properties are used to map language ranges, as specified by RFC 4647 to classes which extend
Analyzer. Supported classes included all the natural language specific classes from Lucene, and also:
Analyzerthat has at least one constructor matching:
getDefaultStopSet()then this is assumed to do what it says on the can; some of the Lucene analyzers store their default stop words elsewhere, and such stopwords are usable by this class. If no stop word set can be found, and there is a constructor without stopwords and a constructor with stopwords, then the former is assumed to use a default stop word set.
Configuration is by means of the bigdata properties file.
All relevant properties start
com.bigdata.search.ConfigurableAnalyzerFactory which we
c.b.s.C in this documentation.
ConfigurableAnalyzerFactory.Options apply to the factory.
Other properties, from
ConfigurableAnalyzerFactory.AnalyzerOptions start with
with the extended language range construct from RFC 4647, section 2.2.
There is an issue that bigdata does not allow '*' in property names, and we use the character '_' to
substitute for '*' in extended language ranges in property names.
These are used to specify an analyzer for the given language range.
If no analyzer is specified for the language range
* then the
StandardAnalyzer is used.
Given any specific language, then the analyzer matching the longest configured language range,
measured in number of subtags is returned by
In the event of a tie, the alphabetically first language range is used.
The algorithm to find a match is "Extended Filtering" as defined in section 3.3.2 of RFC 4647.
Some useful analyzers are as follows:
|Modifier and Type||Class and Description|
Options understood by analyzers created by
Options understood by the
|Constructor and Description|
Builds a new ConfigurableAnalyzerFactory.
|Modifier and Type||Method and Description|
Return the token analyzer to be used for the given language code.
public ConfigurableAnalyzerFactory(FullTextIndex<?> fullTextIndex)
public org.apache.lucene.analysis.Analyzer getAnalyzer(String languageCode, boolean filterStopwords)
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.