ConfigurableAnalyzerFactory
with
the ConfigurableAnalyzerFactory.Options.NATURAL_LANGUAGE_SUPPORT
uses the appropriate natural language analyzers for the two letter codes
and for tags which include sub-tags.public class DefaultAnalyzerFactory extends Object implements IAnalyzerFactory
Analyzer
for almost all languages (other than
English). It uses the correct natural language analyzer only for literals tagged with
certain three letter ISO 639 codes:
"por", "deu", "ger", "zho", "chi", "jpn", "kor", "ces", "cze", "dut", "nld", "gre", "ell",
"fra", "fre", "rus" and "tha". All other tags are treated as English.
These codes do not work if they are used with subtagse.g. "ger-AT" is treated as English.
No two letter code, other than "en" works correctly: note that the W3C and
IETF recommend the use of the two letter forms instead of the three letter forms.Constructor and Description |
---|
DefaultAnalyzerFactory(FullTextIndex fullTextIndex)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
org.apache.lucene.analysis.Analyzer |
getAnalyzer(String languageCode,
boolean filterStopwords)
Deprecated.
Return the token analyzer to be used for the given language code.
|
public DefaultAnalyzerFactory(FullTextIndex fullTextIndex)
public org.apache.lucene.analysis.Analyzer getAnalyzer(String languageCode, boolean filterStopwords)
IAnalyzerFactory
getAnalyzer
in interface IAnalyzerFactory
languageCode
- The language code or null
to use the default
Locale
.filterStopwords
- if false, return an analyzer with no stopwordsCopyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.