infiniflow/ragflow
Fork: 2052 Star: 20876 (更新于 2024-10-26 17:19:56)
license: Apache-2.0
Language: Python .
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
最后发布版本: v0.12.0 ( 2024-09-30 13:19:03)
Document | Roadmap | Twitter | Discord | Demo
📕 Table of Contents
- 💡 What is RAGFlow?
- 🎮 Demo
- 📌 Latest Updates
- 🌟 Key Features
- 🔎 System Architecture
- 🎬 Get Started
- 🔧 Configurations
- 🔧 Build a docker image without embedding models
- 🔧 Build a docker image including embedding models
- 🔨 Launch service from source for development
- 📚 Documentation
- 📜 Roadmap
- 🏄 Community
- 🙌 Contributing
💡 What is RAGFlow?
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
🎮 Demo
Try our demo at https://demo.ragflow.io.
🔥 Latest Updates
- 2024-09-29 Optimizes multi-round conversations.
- 2024-09-13 Adds search mode for knowledge base Q&A.
- 2024-09-09 Adds a medical consultant agent template.
- 2024-08-22 Support text to SQL statements through RAG.
- 2024-08-02 Supports GraphRAG inspired by graphrag and mind map.
🎉 Stay Tuned
⭐️ Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases! 🌟
🌟 Key Features
🍭 "Quality in, quality out"
- Deep document understanding-based knowledge extraction from unstructured data with complicated formats.
- Finds "needle in a data haystack" of literally unlimited tokens.
🍱 Template-based chunking
- Intelligent and explainable.
- Plenty of template options to choose from.
🌱 Grounded citations with reduced hallucinations
- Visualization of text chunking to allow human intervention.
- Quick view of the key references and traceable citations to support grounded answers.
🍔 Compatibility with heterogeneous data sources
- Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.
🛀 Automated and effortless RAG workflow
- Streamlined RAG orchestration catered to both personal and large businesses.
- Configurable LLMs as well as embedding models.
- Multiple recall paired with fused re-ranking.
- Intuitive APIs for seamless integration with business.
🔎 System Architecture
🎬 Get Started
📝 Prerequisites
- CPU >= 4 cores
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
If you have not installed Docker on your local machine (Windows, Mac, or Linux), see Install Docker Engine.
🚀 Start up the server
-
Ensure
vm.max_map_count
>= 262144:To check the value of
vm.max_map_count
:$ sysctl vm.max_map_count
Reset
vm.max_map_count
to a value at least 262144 if it is not.# In this case, we set it to 262144: $ sudo sysctl -w vm.max_map_count=262144
This change will be reset after a system reboot. To ensure your change remains permanent, add or update the
vm.max_map_count
value in /etc/sysctl.conf accordingly:vm.max_map_count=262144
-
Clone the repo:
$ git clone https://github.com/infiniflow/ragflow.git
-
Build the pre-built Docker images and start up the server:
The command below downloads the dev version Docker image for RAGFlow slim (
dev-slim
). Note that RAGFlow slim Docker images do not include embedding models or Python libraries and hence are approximately 1GB in size.$ cd ragflow/docker $ docker compose -f docker-compose.yml up -d
- To download a RAGFlow slim Docker image of a specific version, update the
RAGFlow_IMAGE
variable in docker/.env to your desired version. For example,RAGFLOW_IMAGE=infiniflow/ragflow:v0.12.0-slim
. After making this change, rerun the command above to initiate the download. - To download the dev version of RAGFlow Docker image including embedding models and Python libraries, update the
RAGFlow_IMAGE
variable in docker/.env toRAGFLOW_IMAGE=infiniflow/ragflow:dev
. After making this change, rerun the command above to initiate the download. - To download a specific version of RAGFlow Docker image including embedding models and Python libraries, update the
RAGFlow_IMAGE
variable in docker/.env to your desired version. For example,RAGFLOW_IMAGE=infiniflow/ragflow:v0.12.0
. After making this change, rerun the command above to initiate the download.
NOTE: A RAGFlow Docker image that includes embedding models and Python libraries is approximately 9GB in size and may take significantly longer time to load.
- To download a RAGFlow slim Docker image of a specific version, update the
-
Check the server status after having the server up and running:
$ docker logs -f ragflow-server
The following output confirms a successful launch of the system:
____ ___ ______ ______ __ / __ \ / | / ____// ____// /____ _ __ / /_/ // /| | / / __ / /_ / // __ \| | /| / / / _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ / /_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/ * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:9380 * Running on http://x.x.x.x:9380 INFO:werkzeug:Press CTRL+C to quit
If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a
network abnormal
error because, at that moment, your RAGFlow may not be fully initialized. -
In your web browser, enter the IP address of your server and log in to RAGFlow.
With the default settings, you only need to enter
http://IP_OF_YOUR_MACHINE
(sans port number) as the default HTTP serving port80
can be omitted when using the default configurations. -
In service_conf.yaml, select the desired LLM factory in
user_default_llm
and update theAPI_KEY
field with the corresponding API key.See llm_api_key_setup for more information.
The show is on!
🔧 Configurations
When it comes to system configurations, you will need to manage the following files:
-
.env: Keeps the fundamental setups for the system, such as
SVR_HTTP_PORT
,MYSQL_PASSWORD
, andMINIO_PASSWORD
. - service_conf.yaml: Configures the back-end services.
- docker-compose.yml: The system relies on docker-compose.yml to start up.
You must ensure that changes to the .env file are in line with what are in the service_conf.yaml file.
The ./docker/README file provides a detailed description of the environment settings and service configurations, and you are REQUIRED to ensure that all environment settings listed in the ./docker/README file are aligned with the corresponding configurations in the service_conf.yaml file.
To update the default HTTP serving port (80), go to docker-compose.yml and change 80:80
to <YOUR_SERVING_PORT>:80
.
Updates to the above configurations require a reboot of all containers to take effect:
$ docker compose -f docker/docker-compose.yml up -d
🔧 Build a Docker image without embedding models
This image is approximately 1 GB in size and relies on external LLM and embedding services.
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
pip3 install huggingface-hub nltk
python3 download_deps.py
docker build -f Dockerfile.slim -t infiniflow/ragflow:dev-slim .
🔧 Build a Docker image including embedding models
This image is approximately 9 GB in size. As it includes embedding models, it relies on external LLM services only.
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
pip3 install huggingface-hub nltk
python3 download_deps.py
docker build -f Dockerfile -t infiniflow/ragflow:dev .
🔨 Launch service from source for development
-
Install Poetry, or skip this step if it is already installed:
curl -sSL https://install.python-poetry.org | python3 -
-
Clone the source code and install Python dependencies:
git clone https://github.com/infiniflow/ragflow.git cd ragflow/ export POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true ~/.local/bin/poetry install --sync --no-root # install RAGFlow dependent python modules
-
Launch the dependent services (MinIO, Elasticsearch, Redis, and MySQL) using Docker Compose:
docker compose -f docker/docker-compose-base.yml up -d
Add the following line to
/etc/hosts
to resolve all hosts specified in docker/service_conf.yaml to127.0.0.1
:127.0.0.1 es01 mysql minio redis
In docker/service_conf.yaml, update mysql port to
5455
and es port to1200
, as specified in docker/.env. -
If you cannot access HuggingFace, set the
HF_ENDPOINT
environment variable to use a mirror site:export HF_ENDPOINT=https://hf-mirror.com
-
Launch backend service:
source .venv/bin/activate export PYTHONPATH=$(pwd) bash docker/launch_backend_service.sh
-
Install frontend dependencies:
cd web npm install --force
-
Configure frontend to update
proxy.target
in .umirc.ts tohttp://127.0.0.1:9380
: -
Launch frontend service:
npm run dev
The following output confirms a successful launch of the system:
📚 Documentation
📜 Roadmap
See the RAGFlow Roadmap 2024
🏄 Community
🙌 Contributing
RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our Contribution Guidelines first.
最近版本更新:(数据更新于 2024-10-26 02:14:11)
2024-09-30 13:19:03 v0.12.0
2024-09-14 19:05:20 v0.11.0
2024-08-23 16:53:00 v0.10.0
2024-08-06 10:42:40 v0.9.0
2024-07-08 19:53:59 v0.8.0
2024-05-30 19:12:54 v0.7.0
2024-05-21 11:25:34 v0.6.0
2024-05-08 12:02:04 v0.5.0
2024-04-26 20:16:20 v0.4.0
2024-04-26 09:59:54 v0.3.2
主题(topics):
agent, agents, ai-search, chatbot, chatgpt, data-pipelines, deep-learning, document-parser, document-understanding, genai, graph, graphrag, llm, nlp, pdf-to-text, preprocessing, rag, retrieval-augmented-generation, table-structure-recognition, text2sql
infiniflow/ragflow同语言 Python最近更新仓库
2024-10-31 03:41:57 goauthentik/authentik
2024-10-31 01:35:34 home-assistant/core
2024-10-30 21:01:31 ultralytics/ultralytics
2024-10-30 16:29:12 ai25395/FMatPix
2024-10-30 09:17:48 microsoft/pyright
2024-10-30 03:36:49 paperless-ngx/paperless-ngx