v1.7.0-rc.0

版本发布时间: 2024-02-12 19:38:10

meilisearch/meilisearch最新发布版本:v1.12.0-rc.6(2024-12-16 23:19:51)

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Meilisearch v1.7.0 mostly focuses on improving v1.6.0 features, indexing speed and hybrid search. GPU computing is now supported.

New features and improvements 🔥

Improve AI with Meilisearch (experimental feature)

🗣️ AI work is still experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

To use it, you need to enable vectorSearch through the /experimental-features route.

💡 More documentation about AI search with Meilisearch here.

Add new OpenAI embedding models & ability to override their models dimensions

When using OpenAi as source in your embedders index settings (an example here), you can now specify two new models:

text-embedding-3-small with a default dimension of 1536.
text-embedding-3-large with a default dimension of 3072.

The new models:

are cheaper
produce more relevant results in standardized tests
allow to set up the dimensions of the embeddings to control the trade-off between accuracy and performance (including storage)

It means that it is now possible to pass the dimensions field when using the OpenAi source. This was previously only available for the userProvided source in the previous releases.

There are some rules, though, which we detail with these examples:

"embedders": {
  "large": {
    "source": "openAi",
    "model": "text-embedding-3-large",
    "dimensions": 512 // must be >0, must be <= 3072 for "text-embedding-3-large"
  },
  "small": {
    "source": "openAi",
    "model": "text-embedding-3-small",
    "dimensions": 1024 // must be >0, must be <= 1536 for "text-embedding-3-small"
  },
  "legacy": {
    "source": "openAi",
    "model": "text-embedding-ada-002",
    "dimensions": 1536 // must =1536  for "text-embedding-ada-002"
  },
  "omitted_dimensions": { // uses the default dimension
    "source": "openAi",
    "model": "text-embedding-ada-002",
  }
}

Done in #4375 by @Gosti.

Add GPU support to compute embeddings

Enabling the CUDA feature allows using an available GPU to compute embeddings with a huggingFace embedder. On an AWS Graviton 2, this yields a x3 - x5 improvement on indexing time.

👇 How to enable GPU support through CUDA for HuggingFace embedding generation:

Prerequisites

Linux distribution with a compatible CUDA version
NVidia GPU with CUDA support
A recent Rust compiler to compile Meilisearch from source

Steps

Follow the guide to install the CUDA dependencies
Clone Meilisearch: git clone https://github.com/meilisearch/meilisearch.git
Compile Meilisearch with the cuda feature: cargo build --release --package meilisearch --features cuda
In the freshly compiled Meilisearch, enable the vector store experimental feature:

❯ curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json'  \
--data-binary '{ "vectorStore": true }'

Add an HuggingFace embedder to the settings:

curl \
-X PATCH 'http://localhost:7700/indexes/your_index/settings/embedders' \
-H 'Content-Type: application/json' --data-binary \
'{ "default": { "source": "huggingFace" } }'

Done by @dureuill in #4304.

Improve indexing speed & reduce memory crashes

Auto-batch the task deletions to reduce indexing time (#4316) @irevoire
Improve indexing speed for vector store (makes the Hybrid search experimental feature indexing time more than 10 times faster) (#4332) @Kerollmops @irevoire
Reduce memory usage, so reduce the memory crashes, by capping the maximum memory of the grenad sorters (#4388) @Kerollmops

Stabilize `scoreDetails` feature

In v1.3.0, we introduced the experimental feature scoreDetails. We got enough positive feedback on the feature, and we now stabilize it, making this feature enabled by default.

View detailed scores per ranking rule for each document with the showRankingScoreDetails search parameter:

curl \
  -X POST 'http://localhost:7700/indexes/movies/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "Batman Returns", "showRankingScoreDetails": true }'

When showRankingScoreDetails is set to true, returned documents include a _rankingScoreDetails field. This field contains score values for each ranking rule.

"_rankingScoreDetails": {
  "words": {
    "order": 0,
    "matchingWords": 1,
    "maxMatchingWords": 1,
    "score": 1.0
  },
  "typo": {
    "order": 1,
    "typoCount": 0,
    "maxTypoCount": 1,
    "score": 1.0
  },
  "proximity": {
    "order": 2,
    "score": 1.0
  },
  "attribute": {
    "order": 3,
    "attributes_ranking_order": 0.8,
    "attributes_query_word_order": 0.6363636363636364,
    "score": 0.7272727272727273
  },
  "exactness": {
    "order": 4,
    "matchType": "noExactMatch",
    "matchingWords": 0,
    "maxMatchingWords": 1,
    "score": 0.3333333333333333
  }
}

Done by @dureuill in #4389.

Logs improvements

We made some changes regarding our logs to help with debugging and bug reporting.

Done by @irevoire in #4391

Log format change

⚠️ If you did any automation based on Meilisearch logs, be aware of the changes. More information here.

The default log format evolved slightly from this:

[2024-02-06T14:54:11Z INFO  actix_server::builder] starting 10 workers

To this:

2024-02-06T13:58:14.710803Z  INFO actix_server::builder: 200: starting 10 workers

Experimental: new routes to manage logs

This new version of Meilisearch introduces 3 new experimental routes

POST /logs/stream: streams the log happening in real-time. Requires two parameters:
- target: selects what logs you’re interested in. It takes the form of code_part=log_level. For example, index_scheduler=info
- mode: selects in what format of log you want. Two options are available: human (basic logs) or profile (a way more complex trace)
DELETE /logs/stream: stops the listener from the meilisearch perspective. Does not require any parameters.

💡 More information in the New experimental routes section of this file.

⚠️ Some remarks on this POST /logs/stream route:

You can have only one listener at a time
Listening to the route doesn’t seem to work with xh or httpie for the moment
When killing the listener, it may stay installed on Meilisearch for some time, and you will need to call the DELETE /logs/stream route to get rid of it.

🗣️ This feature is experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

⚠️ Experimental features may be incompatible between Meilisearch versions.

Other improvements

Related to the Prometheus experimental feature: add job variable to Grafana dashboard (#4330) @capJavert

Misc

Dependencies upgrade
- Bump rustls-webpki from 0.101.3 to 0.101.7 (#4263)
- Bump h2 from 0.3.20 to 0.3.24 (#4345)
- Update the dependencies (#4332) @Kerollmops
CIs and tests
- Update SDK test dependencies (#4293) @curquiza
Documentation
- Add Setting API reminder in issue template (#4325) @ManyTheFish
- Update README (#4319) @codesmith-emmy
Misc
- Fix compilation warnings (#4295) @irevoire