当前位置：首页 > news >正文

网站改版301是什么意思seosem推广

news 2026/4/27 5:44:54

网站改版301是什么意思,seosem推广,长春建站模板展示,那个网站做h5好#x1f449;博主介绍#xff1a; 博主从事应用安全和大数据领域#xff0c;有8年研发经验#xff0c;5年面试官经验#xff0c;Java技术专家#xff0c;WEB架构师#xff0c;阿里云专家博主#xff0c;华为云云享专家#xff0c;51CTO 专家博主 ⛪️ 个人社区#x… 博主介绍博主从事应用安全和大数据领域有8年研发经验5年面试官经验Java技术专家WEB架构师阿里云专家博主华为云云享专家51CTO 专家博主 ⛪️ 个人社区个人社区个人主页个人主页专栏地址 ✅ Java 中级八股文专题剑指大厂手撕 Java 八股文文章目录 1. 什么是 Standard 分词器2. 什么是 Simple 分词器3. 什么是 WhiteSpace 分词器4. 什么是 Keyword 分词器 1. 什么是 Standard 分词器 Standard 分词器Standard Tokenizer是 Elasticsearch 和 Lucene 中最常用的分词器之一。它主要用于处理自然语言文本能够识别单词、数字、电子邮件地址、URL 等并将它们分割成单独的词元tokens。Standard 分词器遵循 Unicode 文本分段算法Unicode Text Segmentation Algorithm能够处理多种语言的文本。特点识别单词能够识别常见的单词边界。处理标点符号会忽略大多数标点符号但保留电子邮件地址和 URL。处理数字能够识别并保留数字。处理特殊字符能够处理一些特殊字符如连字符和撇号。示例 POST _analyze {analyzer: standard,text: Elasticsearch is a powerful search engine. Visit https://www.elastic.co for more information. }输出 {tokens: [{ token: elasticsearch, start_offset: 0, end_offset: 11, type: ALPHANUM, position: 0 },{ token: is, start_offset: 12, end_offset: 14, type: ALPHANUM, position: 1 },{ token: a, start_offset: 15, end_offset: 16, type: ALPHANUM, position: 2 },{ token: powerful, start_offset: 17, end_offset: 25, type: ALPHANUM, position: 3 },{ token: search, start_offset: 26, end_offset: 32, type: ALPHANUM, position: 4 },{ token: engine, start_offset: 33, end_offset: 39, type: ALPHANUM, position: 5 },{ token: visit, start_offset: 41, end_offset: 46, type: ALPHANUM, position: 6 },{ token: https, start_offset: 47, end_offset: 52, type: ALPHANUM, position: 7 },{ token: www.elastic.co, start_offset: 53, end_offset: 68, type: ALPHANUM, position: 8 },{ token: for, start_offset: 70, end_offset: 73, type: ALPHANUM, position: 9 },{ token: more, start_offset: 74, end_offset: 78, type: ALPHANUM, position: 10 },{ token: information, start_offset: 79, end_offset: 90, type: ALPHANUM, position: 11 }] }2. 什么是 Simple 分词器 Simple 分词器Simple Tokenizer是一个简单的分词器它将文本按非字母字符如空格、标点符号等分割成词元。它只保留字母字符并将所有字母转换为小写。特点简单分割只按非字母字符分割。小写转换将所有字母转换为小写。不处理数字数字被视为非字母字符会被分割掉。示例 POST _analyze {tokenizer: simple_pattern,text: Elasticsearch is a powerful search engine. Visit www.elastic.co for more information. }输出 {tokens: [{ token: elasticsearch, start_offset: 0, end_offset: 11, type: word, position: 0 },{ token: is, start_offset: 12, end_offset: 14, type: word, position: 1 },{ token: a, start_offset: 15, end_offset: 16, type: word, position: 2 },{ token: powerful, start_offset: 17, end_offset: 25, type: word, position: 3 },{ token: search, start_offset: 26, end_offset: 32, type: word, position: 4 },{ token: engine, start_offset: 33, end_offset: 39, type: word, position: 5 },{ token: visit, start_offset: 41, end_offset: 46, type: word, position: 6 },{ token: wwwelasticco, start_offset: 50, end_offset: 62, type: word, position: 7 },{ token: for, start_offset: 64, end_offset: 67, type: word, position: 8 },{ token: more, start_offset: 68, end_offset: 72, type: word, position: 9 },{ token: information, start_offset: 73, end_offset: 84, type: word, position: 10 }] }3. 什么是 WhiteSpace 分词器 WhiteSpace 分词器Whitespace Tokenizer是最简单的分词器之一它仅按空格分割文本不处理其他标点符号或特殊字符。特点按空格分割只按空格分割文本。保留所有字符不忽略任何字符包括标点符号和数字。示例 POST _analyze {tokenizer: whitespace,text: Elasticsearch is a powerful search engine. Visit www.elastic.co for more information. }输出 {tokens: [{ token: Elasticsearch, start_offset: 0, end_offset: 11, type: word, position: 0 },{ token: is, start_offset: 12, end_offset: 14, type: word, position: 1 },{ token: a, start_offset: 15, end_offset: 16, type: word, position: 2 },{ token: powerful, start_offset: 17, end_offset: 25, type: word, position: 3 },{ token: search, start_offset: 26, end_offset: 32, type: word, position: 4 },{ token: engine., start_offset: 33, end_offset: 40, type: word, position: 5 },{ token: Visit, start_offset: 41, end_offset: 46, type: word, position: 6 },{ token: www.elastic.co, start_offset: 47, end_offset: 62, type: word, position: 7 },{ token: for, start_offset: 63, end_offset: 66, type: word, position: 8 },{ token: more, start_offset: 67, end_offset: 71, type: word, position: 9 },{ token: information., start_offset: 72, end_offset: 85, type: word, position: 10 }] }4. 什么是 Keyword 分词器 Keyword 分词器Keyword Tokenizer是一个不分词的分词器它将整个输入文本作为一个单一的词元处理。这意味着输入文本不会被分割成多个词元。特点不分词将整个输入文本作为一个词元处理。保留原样不进行任何转换或修改。示例 POST _analyze {tokenizer: keyword,text: Elasticsearch is a powerful search engine. Visit www.elastic.co for more information. }输出 {tokens: [{ token: Elasticsearch is a powerful search engine. Visit www.elastic.co for more information., start_offset: 0, end_offset: 85, type: word, position: 0 }] }Standard 分词器适用于自然语言文本能够识别单词、数字、电子邮件地址和 URL。Simple 分词器简单地按非字母字符分割文本并将所有字母转换为小写。WhiteSpace 分词器仅按空格分割文本保留所有字符。Keyword 分词器将整个输入文本作为一个单一的词元处理不分词。这些分词器各有特点适用于不同的场景。选择合适的分词器可以提高搜索和索引的效率和准确性。精彩专栏推荐订阅在下方专栏 ✅ 2023年华为OD机试真题A卷B卷面试指导 ✅ 精选100套 Java 项目案例 ✅ 面试需要避开的坑活动 ✅ 你找不到的核心代码 ✅ 带你手撕 Spring ✅ Java 初阶

查看全文

http://www.hkea.cn/news/14431072/