MyGit

v1.6.0-rc.0

meilisearch/meilisearch

版本发布时间: 2023-12-18 22:43:48

meilisearch/meilisearch最新发布版本:v1.10.1(2024-09-02 17:59:14)

⚠️ If you use the Meilisearch Docker image, please use rc1 or later instead.

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Since we know the indexing time of Meilisearch is a real pain point for some of our users, Meilisearch v1.6 focuses mainly on indexing performances. But this new version is not only about optimization: Meilisearch now includes embedders for the vector search. You can benefit from the power of Meilisearch with semantic and hybrid searches!

New features and improvements 🔥

Experimental: improve vector search

Meilisearch introduces a hybrid search mechanism that allows users to mix full-text and semantic search at search time to provide more accurate and comprehensive results.

Plus, you can directly define the embedders you want to use, so you don't need to interact with a third party on your side to generate embeddings: Meilisearch will interact with it for you.

Settings

Before using hybrid search, you need to define an embedder in your settings. You can even define multiple embedders in the index settings.

You must set them via the /PATCH indexes/:index_uid/settings route. Here is an example of a payload defining 3 embedders named default, image and translation:

{
  "embedders": {
    "default": {
      "source": {
        "openAi": {
          "apiKey": "<your-OpenAI-API-key>",
          "model": "text-embedding-ada-002"
        }
      },
      "documentTemplate": {
        "template": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
      }
    },
    "image": {
      "source": { "userProvided": { "dimensions": 512 } }
    },
    "translation": {
      "source": {
        "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" }
      },
      "documentTemplate": {
        "template": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
      }
    }
  }
}

documentTemplate is a view of your document that will serve as the base for computing the embedding. This field is a JSON string expecting Liquid format.

3 kinds of embedders are available for source:

⚠️ If using the HuggingFace model, the computation will be done on your machine and will use your CPU (not your GPU), which can lead to bad indexing performance.

Hybrid & semantic search

You can perform a hybrid search by using the hybrid field when calling the POST /index/:index_uid/search route.

Here is an example of a hybrid search payload:

{
    "q": "Plumbers and dinosaurs",
    "hybrid": {
        "semanticRatio": 0.9,
        "embedder": "default"
    }
}

embedder is the embedder you choose to perform the search among the ones you defined in your settings. semanticRatio: the value should be between 0 and 1. The default value is 0,5. 1 corresponds to a full semantic search, whereas 0 is about a full-text search.

⚠️ Breaking changes for beta users of the previous version of vector search

For people who used Meilisearch with the experimental vector search feature (between v1.3.0 and v1.5.0), some changes happened in the API usage:

"embedders": {
    "default": {
      "source": { "userProvided": { "dimensions": 512 } }
    }
}

Before, in your document you provided:

"_vectors": [
  [0.0, 0.1]
]

Now the format is:

"_vectors": {
  "image2text": [0.0, 0.1, ...]
}

To know more about the new usage, refer to the sections above about settings or to the documentation

More technical information

You can check out

Done in #4226 by @dureuill, @irevoire, @Kerollmops and @ManyTheFish.

Improve indexing speed

This version introduces huge indexing performance improvements. Meilisearch has been optimized to

Some metrics: on an e-commerce dataset of 2.5Gb of documents, we noticed more than a 50% time reduction when adding documents for the first time. With a scenario updating the documents frequently and partially, the reduction is about 50% or even 75%. Most of all, the indexing time does not exponentially increase anymore.

⚠️ Performance improvements can highly depend on your dataset, the size of your machine and the way of indexing documents.

Done in #4090 by @ManyTheFish, @dureuill and @Kerollmops.

Disk space usage reduction

We made improvements regarding disk space usage. Meilisearch now stores less internal data, so require a smaller database on your disk.

With a ~15Mb dataset, the created database is 40% and 50% smaller. Additionally, after several updates, the database size becomes more stable, which was not the case before. So, the more you add documents, the more this improvement will be visible.

Customize proximity precision to gain indexing performance

Still, in the purpose of reducing the indexing speed, you can now customize the accuracy of the proximity ranking rules based on your needs.

However, the computation needed for the proximity ranking rule is huge and can lead to a big indexing time. Since the proximity ranking rule purpose for the search relevancy is not always necessary for your use case, you now have the possibility to make it less relevant to reduce the indexing speed. Indeed, depending on your use case, the relevancy impact can even be invisible.

Use the proximityPrecision settings:

curl \
  -X PATCH 'http://localhost:7700/indexes/books/settings/proximity-precision' \
  -H 'Content-Type: application/json'  \
  --data-binary '{
    "proximityPrecision": "byAttribute"
  }'

The default value of proximityPrecision is byWord. byAttribute will improve your indexing performance but can impact the relevancy.

Technical explanations: byWord considers the proximity as an exact distance between words, whereas byAttribute considers the proximity as if the words are in the same attribute or not, making it less accurate.

Done in #4225 by @ManyTheFish.

Experimental: limit the number of batched tasks

To speed up indexing performance, Meilisearch batches similar tasks to process them as a big batch. However, sometimes, the huge amount of enqueued tasks leads to issues with Meilisearch crashing or being stuck.

To limit the number of batched tasks, you can configure it launch: use this environment variable MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS, the CLI argument --experimental-max-number-of-batched-task when launching Meilisearch, or directly in the config file.

Done in #4249 by @Kerollmops

Fixes 🐞

Misc

❤️ Thanks again to our external contributors:

相关地址:原始地址 下载(tar) 下载(zip)

1、 meilisearch-linux-aarch64 121.57MB

2、 meilisearch-linux-amd64 122.38MB

3、 meilisearch-macos-amd64 114.15MB

4、 meilisearch-macos-apple-silicon 112.88MB

5、 meilisearch-windows-amd64.exe 112MB

查看:2023-12-18发行的版本