Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Apostrophe
|
Strips all characters after an apostrophe (including the apostrophe itself).
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html
|
ArabicNormalization
|
A token filter that applies the Arabic normalizer to normalize the orthography.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html
|
AsciiFolding
|
Converts alphabetic, numeric, and symbolic Unicode characters which
are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if such equivalents exist.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
|
CjkBigram
|
Forms bigrams of CJK terms that are generated from StandardTokenizer.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html
|
CjkWidth
|
Normalizes CJK width differences. Folds fullwidth ASCII variants into
the equivalent basic Latin, and half-width Katakana variants into the
equivalent Kana.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html
|
Classic
|
Removes English possessives, and dots from acronyms.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html
|
CommonGram
|
Construct bigrams for frequently occurring terms while indexing.
Single terms are still indexed too, with bigrams overlaid.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html
|
EdgeNGram
|
Generates n-grams of the given size(s) starting from the front or the
back of an input token.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html
|
Elision
|
Removes elisions. For example, "l'avion" (the plane) will be converted
to "avion" (plane).
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
|
GermanNormalization
|
Normalizes German characters according to the heuristics of the
German2 snowball algorithm.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
|
HindiNormalization
|
Normalizes text in Hindi to remove some differences in spelling
variations.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html
|
IndicNormalization
|
Normalizes the Unicode representation of text in Indian languages.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html
|
KeywordRepeat
|
Emits each incoming token twice, once as keyword and once as
non-keyword.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html
|
KStem
|
A high-performance kstem filter for English.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html
|
Length
|
Removes words that are too long or too short.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html
|
Limit
|
Limits the number of tokens while indexing.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html
|
Lowercase
|
Normalizes token text to lower case.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html
|
NGram
|
Generates n-grams of the given size(s).
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
|
PersianNormalization
|
Applies normalization for Persian.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html
|
Phonetic
|
Create tokens for phonetic matches.
https://lucene.apache.org/core/4_10_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html
|
PorterStem
|
Uses the Porter stemming algorithm to transform the token stream.
http://tartarus.org/~martin/PorterStemmer/
|
Reverse
|
Reverses the token string.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
|
ScandinavianFoldingNormalization
|
Folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o. It also
discriminates against use of double vowels aa, ae, ao, oe and oo,
leaving just the first one.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html
|
ScandinavianNormalization
|
Normalizes use of the interchangeable Scandinavian characters.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html
|
Shingle
|
Creates combinations of tokens as a single token.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html
|
Snowball
|
A filter that stems words using a Snowball-generated stemmer.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html
|
SoraniNormalization
|
Normalizes the Unicode representation of Sorani text.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html
|
Stemmer
|
Language specific stemming filter.
https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#TokenFilters
|
Stopwords
|
Removes stop words from a token stream.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html
|
Trim
|
Trims leading and trailing whitespace from tokens.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html
|
Truncate
|
Truncates the terms to a specific length.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html
|
Unique
|
Filters out tokens with same text as the previous token.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html
|
Uppercase
|
Normalizes token text to upper case.
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html
|
WordDelimiter
|
Splits words into subwords and performs optional transformations on
subword groups.
|