Elasticsearch 查询

ES的核心功能之一就是提供搜索，搜索有两种方式，一种从 url 中读取所有的查询参数，用 GET 请求。另一个是使用 POST 请求将 JSON 数据作为请求体，支持完整的 Elasticsearch DSL。DSL查询表达式(Query DSL)，它是一种非常灵活又富有表现力的查询语言。 Elasticsearch 使用它可以以简单的 JSON 接口来展现 Lucene 功能的绝大部分。在你的应用中，你应该用它来编写你的查询语句。它可以使你的查询语句更灵活、更精确、易读和易调试。

例如：查询customer索引中任意字段中有关键字"深圳"的文档

GET /customer/_search?q=深圳

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.7140323,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.7140323,
                "_source": {
                    "name": "张真人",
                    "address": "广东深圳"
                }
            }
        ]
    }
}

你也可以用q=*来查询所有的结果或者 _search 后面不带任何参数。

下面讨论的都是用 POST 请求将 JSON 数据作为请求体的查询

单字段查询

用 match 指定单字段查询。

POST /customer/_search

# body
{
    "query" : {
         "match" : {"name":"张"}
    }
}

# 返回
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.27951443,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.27951443,
                "_source": {
                    "name": "张真人",
                    "address": "广东深圳",
                    "age": 100
                }
            },
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.27951443,
                "_source": {
                    "name": "张无忌",
                    "address": "万安寺",
                    "age": 20
                }
            },
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "0xMBGGoB15mDBRoAwkbg",
                "_score": 0.27951443,
                "_source": {
                    "name": "张翠山",
                    "address": "湖北十堰",
                    "age": 108
                }
            }
        ]
    }
}

分页

from 和 size 指定返回数据的起始位置和数据的大小。

POST /customer/_search

# body
{ 
    "query" : {
         "match" : {"name":"张"}
    },
    "size":2,
    "from": 2
}
# 返回
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.27951443,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "0xMBGGoB15mDBRoAwkbg",
                "_score": 0.27951443,
                "_source": {
                    "name": "张翠山",
                    "address": "湖北十堰",
                    "age": 108
                }
            }
        ]
    }
}

指定返回字段

_source 用于指定返回的字段列表

POST /customer/_search
# body
{
    "query" : {
         "match" : {"name":"张"}
    },
    "_source": ["name"]
}

# 返回
{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.27951443,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "0xMBGGoB15mDBRoAwkbg",
                "_score": 0.27951443,
                "_source": {
                    "name": "张翠山"
                }
            }
        ]
    }
}

高亮

POST /customer/_search
# body

{
    "query" : {
         "match" : {"name":"张"}
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

# 返回

{
    "took": 80,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.27951443,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "0xMBGGoB15mDBRoAwkbg",
                "_score": 0.27951443,
                "_source": {
                    "name": "张翠山"
                },
                "highlight": {
                    "name": [
                        "<em>张</em>翠山"
                    ]
                }
            }
        ]
    }
}

多索引查询

# 1、在URL中显示地指定索引
curl -XGET 'http://localhost:9200/index1,index2/_search?q=yourQueryHere'

# 2、所有索引中搜索
curl -XGET 'http://localhost:9200/_all/_search?q=yourQueryHere'

# 3、或者 
curl -XGET 'http://localhost:9200/_search?q=yourQueryHere'

多字段（Multi-filed）查询

用 multi_match 中的 fields 指定多字段，如果是所有字段中查询，则可以使用 [*] 代替

POST /_search

{
    "query" : {
         "multi_match" : {
          "query":    "张", 
          "fields": [ "name", "title" ] 
        }
    }
}

提升（Boosting）字段权重

当在多个字段中搜索关键字，但每个字段的权重不一样，例如标题和内容中都含有某个关键字，但是希望标题的权重更大一些时，可以用^提升字段权重，返回的数据如果是标题中有该关键字的就展示在前面。

{
    "query" : {
         "multi_match" : {
             "query":    "张", 
             "fields": [ "name^3", "address" ] 
         }
    }
}

Function Score Query

搜索结果默认会根据文档的相关度进行排序，在 Elasticsearch 中 function_score 是用于处理文档分值的 DSL，它会在查询结束后对每一个匹配的文档进行一系列的重新打分操作，以生成的最终分数进行排序。你可以在查询上指定一个或多个函数，它提供了几种默认的计算分值的函数。

script_score ：自己提供一个计算脚本
weight：设置权重
field_value_factor：将某个字段的值进行计算得出分数。
random_score：随机得到 0 到 1 分数
（Decay）衰减函数：同样以某个字段的值为标准，距离某个值越近得分越高

函数计算得出的分值与原始的分值_score和并操作用属性boost_mode指定，boost_mode 有如下几种选项

multiply：与_score相乘
sum：与_score相加
min：取较小的值
avg：取平均值
max：取较大的值
replace：替换掉_score

field_value_factor：域值（Field Value）因子

把文档中的某个特定字段作为计算相关性分值的一个因素在很多场景可以使用，例如原创的文章分值更高，在搜索时，可以将 function_score 查询与 field_value_factor 结合使用。

这个例子中，我先将所有的为文档新增了age字段，我希望年龄大的排序更靠前。

{
    "query" : {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "张",
                    "fields": ["name"]
                }
            },
            "field_value_factor": {
                "field" : "age",
                "modifier": "log1p",
                "missing":0,
                "factor" : 2
            }
        }
    }
}

modifier 的值可以为： none （默认状态）、 log 、 log1p 、 log2p 、 ln 、 ln1p 、 ln2p 、 square 、 sqrt 以及 reciprocal。

none：不处理
log：计算对数
log1p：先将字段值 +1，再计算对数
log2p：先将字段值 +2，再计算对数
ln：计算自然对数
ln1p：先将字段值 +1，再计算自然对数
ln2p：先将字段值 +2，再计算自然对数
square：计算平方
sqrt：计算平方根
reciprocal：计算倒数

factor 值大于 1 会提升效果， factor 值小于 1 会降低效果。

衰变（Decay）函数

有增长因子，也有衰变因子，比如时间，我希望越近越好，某个地址越近越好. decay functions 让我们可以基于一个单值的数值型字段（比如日期、地点或者价格类似标准的数值型字段）来计算分数。

这三个衰减函数分别为：

线性衰减(linear)
指数衰减(exp)
高斯衰减(gauss)

origin: 中心点或字段可能的最佳值，落在原点origin上的文档评分 _score 为满分1.0 比如如果你希望发布时间越近越好的话，那么当前是最好的。

scale: 衰减率，即一个文档从原点origin下落时，评分_score改变的速度(例如每 £10 欧元或每 100 米)。

decay: 从原点origin衰减到scale所得的评分_score，默认值为 0.5。

offset: 以原点origin为中心点，为其设置一个非零的偏移量offset 覆盖一个范围，而不只是单个原点。在范围 -offset <= origin <= +offset 内的所有评分_score都是 1.0。

例如：

POST /_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "exp": {
                        "publish_date" : {
                            "origin": "2014-06-15",
                            "offset": "7d",
                            "scale" : "30d"
                        }
                    }
                }
            ],
            "boost_mode" : "replace"
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}

布尔查询

布尔查询用于匹配复杂查询，例如搜索标题为“Python”的文章，同时要求阅读量大于1000，或者发布时间必须在某个时间范围内。

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

must 于 should 的区别是前者严格满足，他们都可以接受单个字典，也可以接受多个字典组成的数组。

参考文档：

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-script-score
https://n3xtchen.github.io/n3xtchen/elasticsearch/2017/07/05/elasticsearch-23-useful-query-example
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-function-score-query.html#function-field-value-factor
https://www.elastic.co/guide/cn/elasticsearch/guide/current/combining-filters.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

关注公众号「Python之禅」，回复「1024」免费获取Python资源