网站建设的流程是什么,公司网站推广技巧,我为群众办实事项目清单,哔哩哔哩网站分析模糊查询 前缀搜索#xff1a;prefix 概念#xff1a;以xx开头的搜索#xff0c;不计算相关度评分。 注意#xff1a; 前缀搜索匹配的是term#xff0c;而不是field。前缀搜索的性能很差前缀搜索没有缓存前缀搜索尽可能把前缀长度设置的更长 语法#xff1a; GET ind…模糊查询 前缀搜索prefix 概念以xx开头的搜索不计算相关度评分。 注意 前缀搜索匹配的是term而不是field。前缀搜索的性能很差前缀搜索没有缓存前缀搜索尽可能把前缀长度设置的更长 语法 GET index/_search
{query: {prefix: {field: {value: word_prefix}}}
}
index_prefixes: 默认 min_chars : 2, max_chars : 5 #prefix: 前缀搜索
DELETE my_index
# elasticsearch stack
# elasticsearch search
# el
# ela
# elas elasticsearch
PUT my_index
{mappings: {properties: {text: {analyzer: ik_max_word,type: text,index_prefixes:{min_chars:2,max_chars:4},fields: {keyword: {type: keyword,ignore_above: 256}}}}}
}
GET my_index/_mapping
POST /my_index/_bulk?filter_pathitems.*.error
{index:{_id:1}}
{text:城管打电话喊商贩去摆摊摊}
{index:{_id:2}}
{text:笑果文化回应商贩老农去摆摊}
{index:{_id:3}}
{text:老农耗时17年种出椅子树}
{index:{_id:4}}
{text:夫妻结婚30多年AA制,被城管抓}
{index:{_id:5}}
{text:黑人见义勇为阻止抢劫反被铐住}
GET my_index/_search
GET my_index/_mapping
GET _analyze
{text: [夫妻结婚30多年AA制,被城管抓]
}
GET my_index/_search
{query: {prefix: {text: {value: 城管}}}
} 通配符wildcard 概念通配符运算符是匹配一个或多个字符的占位符。例如*通配符运算符匹配零个或多个字符。您可以将通配符运算符与其他字符结合使用以创建通配符模式。 注意 通配符匹配的也是term而不是field 语法 GET index/_search
{query: {wildcard: {field: {value: word_with_wildcard}}}
}# 通配符
DELETE my_index
POST /my_index/_bulk
{ index: { _id: 1} }
{ text: my english }
{ index: { _id: 2} }
{ text: my english is good }
{ index: { _id: 3} }
{ text: my chinese is good }
{ index: { _id: 4} }
{ text: my japanese is nice }
{ index: { _id: 5} }
{ text: my disk is full }
DELETE product_en
POST /product_en/_bulk
{ index: { _id: 1} }
{ title: my english,desc : shouji zhong de zhandouji,price : 3999, tags: [ xingjiabi, fashao, buka, 1]}
{ index: { _id: 2} }
{ title: xiaomi nfc phone,desc : zhichi quangongneng nfc,shouji zhong de jianjiji,price : 4999, tags: [ xingjiabi, fashao, gongjiaoka , asd2fgas]}
{ index: { _id: 3} }
{ title: nfc phone,desc : shouji zhong de hongzhaji,price : 2999, tags: [ xingjiabi, fashao, menjinka , as345]}
{ title: { _id: 4} }
{ text: xiaomi erji,desc : erji zhong de huangmenji,price : 999, tags: [ low, bufangshui, yinzhicha, 4dsg ]}
{ index: { _id: 5} }
{ title: hongmi erji,desc : erji zhong de kendeji,price : 399, tags: [ lowbee, xuhangduan, zhiliangx , sdg5]}
GET my_index/_search
GET product_en/_searchGET my_index/_search
{query: {wildcard: {text.keyword: {value: my eng*ish}}}
}
GET product_en/_mapping
#exact value
GET product_en/_search
{query: {wildcard: {tags.keyword: {value: men*inka}}}
}正则regexp 概念regexp查询的性能可以根据提供的正则表达式而有所不同。为了提高性能应避免使用通配符模式如.或 .?未经前缀或后缀 语法 GET index/_search
{query: {regexp: {field: {value: regex,flags: ALL,}}}
}#正则
GET product_en/_search
GET product_en/_search
{query: {regexp: {title: [\\s\\S]*nfc[\\s\\S]*}}
}
GET product_en/_search
GET product_en/_search
{query: {regexp: {desc: {value: zh~dng,flags: COMPLEMENT}}}
}
GET product_en/_search
{query: {regexp: {tags.keyword: {value: .*2-3.*,flags: INTERVAL}}}
}flags ALL 启用所有可选操作符。 COMPLEMENT 启用操作符。可以使用对下面最短的模式进行否定。例如 a~bc # matches ‘adc’ and ‘aec’ but not ‘abc’ INTERVAL 启用操作符。可以使用匹配数值范围。例如 foo1-100 # matches ‘foo1’, ‘foo2’ … ‘foo99’, ‘foo100’ foo01-100 # matches ‘foo01’, ‘foo02’ … ‘foo99’, ‘foo100’ INTERSECTION 启用操作符它充当AND操作符。如果左边和右边的模式都匹配则匹配成功。例如: aaa..bbb # matches ‘aaabbb’ ANYSTRING 启用操作符。您可以使用来匹配任何整个字符串。 您可以将操作符与和~操作符组合起来创建一个“everything except”逻辑。例如: ~(abc.) # matches everything except terms beginning with ‘abc’ 模糊查询fuzzy 混淆字符 (box → fox) 缺少字符 (black → lack) 多出字符 (sic → sick) 颠倒次序 (act → cat) 语法 GET index/_search
{query: {fuzzy: {field: {value: keyword}}}
}# fuzzy:模糊查询
GET product_en/_search
GET product_en/_search
{query: {fuzzy: {desc: {value: quangongneng nfc,fuzziness: 2}}}
}GET product_en/_search
{query: {match: {desc: {query: nfe quasdasdasdasd,fuzziness: 1}}}
}参数 value必须关键词 fuzziness编辑距离012并非越大越好召回率高但结果不准确 两段文本之间的Damerau-Levenshtein距离是使一个字符串与另一个字符串匹配所需的插入、删除、替换和调换的数量 距离公式Levenshtein是lucene的es改进版Damerau-Levenshtein axeaex Levenshtein2 Damerau-Levenshtein1 transpositions可选布尔值指示编辑是否包括两个相邻字符的变位ab→ba。默认为true。 短语前缀match_phrase_prefix match_phrase match_phrase会分词被检索字段必须包含match_phrase中的所有词项并且顺序必须是相同的被检索字段包含的match_phrase中的词项之间不能有其他词项 概念 match_phrase_prefix与match_phrase相同,但是它多了一个特性,就是它允许在文本的最后一个词项(term)上的前缀匹配,如果 是一个单词,比如a,它会匹配文档字段所有以a开头的文档,如果是一个短语,比如 “this is ma” ,他会先在倒排索引中做以ma做前缀搜索,然后在匹配到的doc中做match_phrase查询,(网上有的说是先match_phrase,然后再进行前缀搜索, 是不对的) 参数 analyzer 指定何种分析器来对该短语进行分词处理max_expansions 限制匹配的最大词项boost 用于设置该查询的权重slop 允许短语间的词项(term)间隔slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配 什么是相隔多远 意思是说为了让查询和文档匹配你需要移动词条多少次 原理解析https://www.elastic.co/cn/blog/found-fuzzy-search#performance-considerations
# match_phrase_prefix
GET product_en/_search
{query: {match_phrase: {desc: shouji zhong de}}
}GET product_en/_search
{query: {match_phrase_prefix: {desc: {query: de zhong shouji hongzhaji,max_expansions: 50,slop:3}}}
}GET product_en/_search
{query: {match_phrase_prefix: {desc: {query: zhong hongzhaji,max_expansions: 50,slop: 3}}}
}# source: zhong de hongzhaji
# query: zhong hongzhaji# source: shouji zhong de hongzhaji
# query: de zhong shouji hongzhaji# de shouji/zhong hongzhaji 1次
# shouji/de zhong hongzhaji 2次
# shouji zhong/de hongzhaji 3次
# shouji zhong de hongzhaji 4次 N-gram和edge ngram tokenizer GET _analyze
{tokenizer: ngram,text: reba always loves me
}token filter GET _analyze
{tokenizer: ik_max_word,filter: [ ngram ],text: reba always loves me
}min_gram创建索引所拆分字符的最小阈值 max_gram创建索引所拆分字符的最大阈值 ngram从每一个字符开始,按照步长,进行分词,适合前缀中缀检索 edge_ngram从第一个字符开始,按照步长,进行分词,适合前缀匹配场景 # ngram 和 edge-ngram#ngram min_gram 1 “max_gram”: 2
GET _analyze { “tokenizer”: “ik_max_word”, “filter”: [ “edge_ngram” ], “text”: “reba always loves me” }
#min_gram 1 “max_gram”: 1 #r a l m
#min_gram 1 “max_gram”: 2 #r a l m #re al lo me
#min_gram 2 “max_gram”: 3 #re al lo me #reb alw lov me
PUT my_index { “settings”: { “analysis”: { “filter”: { “2_3_edge_ngram”: { “type”: “edge_ngram”, “min_gram”: 2, “max_gram”: 3 } }, “analyzer”: { “my_edge_ngram”: { “type”:“custom”, “tokenizer”: “standard”, “filter”: [ “2_3_edge_ngram” ] } } } }, “mappings”: { “properties”: { “text”: { “type”: “text”, “analyzer”:“my_edge_ngram”, “search_analyzer”: “standard” } } } } GET /my_index/_mapping
POST /my_index/_bulk { “index”: { “_id”: “1”} } { “text”: “my english” } { “index”: { “_id”: “2”} } { “text”: “my english is good” } { “index”: { “_id”: “3”} } { “text”: “my chinese is good” } { “index”: { “_id”: “4”} } { “text”: “my japanese is nice” } { “index”: { “_id”: “5”} } { “text”: “my disk is full” }
GET /my_index/_search GET /my_index/_mapping GET /my_index/_search { “query”: { “match_phrase”: { “text”: “my eng is goo” } } }
PUT my_index2 { “settings”: { “analysis”: { “filter”: { “2_3_grams”: { “type”: “edge_ngram”, “min_gram”: 2, “max_gram”: 3 } }, “analyzer”: { “my_edge_ngram”: { “type”:“custom”, “tokenizer”: “standard”, “filter”: [ “2_3_grams” ] } } } }, “mappings”: { “properties”: { “text”: { “type”: “text”, “analyzer”:“my_edge_ngram”, “search_analyzer”: “standard” } } } } GET /my_index2/_mapping POST /my_index2/_bulk { “index”: { “_id”: “1”} } { “text”: “my english” } { “index”: { “_id”: “2”} } { “text”: “my english is good” } { “index”: { “_id”: “3”} } { “text”: “my chinese is good” } { “index”: { “_id”: “4”} } { “text”: “my japanese is nice” } { “index”: { “_id”: “5”} } { “text”: “my disk is full” }
GET /my_index2/_search { “query”: { “match_phrase”: { “text”: “my eng is goo” } } }
GET _analyze { “tokenizer”: “ik_max_word”, “filter”: [ “ngram” ], “text”: “用心做皮肤,用脚做游戏” }