MyGit

v2.0.0

argilla-io/argilla

版本发布时间: 2024-07-31 14:49:24

argilla-io/argilla最新发布版本:v2.1.0(2024-09-05 23:11:08)

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

[!IMPORTANT] If you want to continue using your legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK & documentation

We've redesigned our SDK with the idea to adapt it to the new single Dataset and Record classes and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making it simpler and faster to configure your dataset and get it up and running.

Here's an example of what creating a Dataset looks like:

import argilla as rg
from datasets import load_dataset

# log to the Argilla client
client = rg.Argilla(
    api_url="<api_url>",
    api_key="<api_key>"
    # headers={"Authorization": f"Bearer {HF_TOKEN}"}
)

# configure dataset settings
settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)

# create the dataset in your Argilla instance
dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

# get some data from the hugging face hub and load the records
data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})

To learn more about this SDK and how it works, check out our revamped documentation: https://argilla-io.github.io/argilla/latest

We made this new documentation site from scratch, applying the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find.

New UI layout

We have also redesigned part of our UI for Argilla 2.0:

https://github.com/user-attachments/assets/2d959c8a-b4ac-446b-8326-bd66daa28816

Automatic task distribution

Argilla 2.0 also comes with an automated way to split the task of annotating a dataset among a team. Here's how it works in a nutshell:

By default, the minimum submitted answers is 1, but you can create a dataset with a different value:

settings = rg.Settings(
    guidelines="These are some guidelines.",
    fields=[
        rg.TextField(
            name="text",
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="label",
            labels=["label_1", "label_2", "label_3"]
        ),
    ],
    distribution=rg.TaskDistribution(min_submitted=3)
)

You can also change the value of an existing dataset as long as it has no responses. You can do this from the General tab inside the Dataset Settings page in the UI or from the SDK:

import argilla as rg

client = rg.Argilla(...)

dataset = client.datasets("my_dataset")

dataset.settings.distribution.min_submitted = 4

dataset.update()

To learn more, check our guide on how to distribute the annotation task.

Easily deploy in Hugging face Spaces

We've streamlined the deployment of an Argilla Space in the Hugging Face Hub. Now, there's no need to manage users and passwords. Follow these simple steps to create your Argilla Space:

https://github.com/user-attachments/assets/a57a8712-ef4e-45f3-8c38-7bbc47adf02b

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.29.1...v2.0.0

相关地址:原始地址 下载(tar) 下载(zip)

查看:2024-07-31发行的版本