免费网站转app,网站更换服务器教程,外贸平台推广公司,如何搭建个人博客Elasticsearch#xff08;简称 ES#xff09;提供了多种预置的分词器#xff08;Analyzer#xff09;#xff0c;用于对文本进行分词处理。分词器通常由字符过滤器#xff08;Character Filters#xff09;、分词器#xff08;Tokenizer#xff09;和词元过滤器#…Elasticsearch简称 ES提供了多种预置的分词器Analyzer用于对文本进行分词处理。分词器通常由字符过滤器Character Filters、分词器Tokenizer和词元过滤器Token Filters组成。以下是一些常用的预置分词器及其示例 1. Standard Analyzer标准分词器
默认分词器适用于大多数语言。处理步骤 使用标准分词器Standard Tokenizer按空格和标点符号分词。应用小写过滤器Lowercase Token Filter将词元转换为小写。 示例POST _analyze
{analyzer: standard,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[the, 2, quick, brown, foxes, jumped, over, the, lazy, dogs, bone]2. Simple Analyzer简单分词器
按非字母字符如数字、标点符号分词并将词元转换为小写。示例POST _analyze
{analyzer: simple,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone]3. Whitespace Analyzer空格分词器
仅按空格分词不转换大小写不处理标点符号。示例POST _analyze
{analyzer: whitespace,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dogs, bone.]4. Keyword Analyzer关键词分词器
将整个文本作为一个单独的词元不做任何分词处理。示例POST _analyze
{analyzer: keyword,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.]5. Stop Analyzer停用词分词器
类似于简单分词器但会过滤掉常见的停用词如 “the”, “and”, “a” 等。示例POST _analyze
{analyzer: stop,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[quick, brown, foxes, jumped, over, lazy, dog, s, bone]6. Pattern Analyzer正则分词器
使用正则表达式定义分词规则。示例POST _analyze
{analyzer: pattern,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}默认按非字母字符分词并转换为小写[the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone]7. Language Analyzer语言分词器
针对特定语言优化支持多种语言如英语、中文、法语等。示例英语POST _analyze
{analyzer: english,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[2, quick, brown, fox, jump, over, lazi, dog, bone]8. ICU Analyzer国际化分词器
基于 ICUInternational Components for Unicode库支持多语言分词。示例POST _analyze
{analyzer: icu_analyzer,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[the, 2, quick, brown, foxes, jumped, over, the, lazy, dogs, bone]9. Fingerprint Analyzer指纹分词器
对文本进行分词、去重、排序并生成唯一的“指纹”。示例POST _analyze
{analyzer: fingerprint,text: The 2 QUICK Brown-Foxes jumped over the lazy dogs bone.
}输出[2, bone, brown, dog, foxes, jumped, lazy, over, quick, the]总结
Elasticsearch 的预置分词器适用于不同的场景开发者可以根据需求选择合适的分析器或者自定义分词器以满足特定需求。