MyGit

assafelovic/gpt-researcher

Fork: 1919 Star: 14539 (更新于 2024-10-20 18:25:58)

license: Apache-2.0

Language: Python .

LLM based autonomous agent that conducts in-depth web research on any given topic

最后发布版本: v3.0.8 ( 2024-09-15 13:22:02)

官方网址 GitHub网址

Logo

Website Documentation Discord Follow

PyPI version GitHub Release Open In Colab Docker Image Version Twitter Follow

English | 中文 | 日本語 | 한국어

🔎 GPT Researcher

GPT Researcher is an autonomous agent designed for comprehensive web and local research on any given task.

The agent can produce detailed, factual and unbiased research reports, with customization options for focusing on relevant resources and outlines. Inspired by the recent Plan-and-Solve and RAG papers, GPT Researcher addresses issues of misinformation, speed, determinism and reliability, offering a more stable performance and increased speed through parallelized agent work, as opposed to synchronous operations.

Our mission is to empower individuals and organizations with accurate, unbiased, and factual information by leveraging the power of AI.

Why GPT Researcher?

  • Forming objective conclusions for manual research tasks can take time, sometimes weeks, to find the right resources and information
  • Current LLMs are trained on past and outdated information, with heavy risks of hallucinations, making them almost irrelevant for research tasks.
  • Current LLMs are limited to short token outputs, which are insufficient for long, detailed research reports (over 2,000 words).
  • Services that enable web searches, such as ChatGPT or Perplexity, only consider limited sources and content, which in some cases results in misinformation and shallow results.
  • Using only a selection of web sources can create bias in determining the right conclusions for research tasks.

Demo

https://github.com/user-attachments/assets/092e9e71-7e27-475d-8c4f-9dddd28934a3

Architecture

The main idea is to run 'planner' and 'execution' agents, where the planner generates questions for research, and the execution agents seek the most relevant information based on each generated research question. Finally, the planner filters and aggregates all related information and creates a research report.

The agents leverage both gpt-4o-mini and gpt-4o (128K context) to complete a research task. We optimize for costs using each only when necessary. The average research task takes about 2 minutes to complete and costs approximately $0.005.

More specifically:

  • Create a domain specific agent based on research query or task.
  • Generate a set of research questions that together form an objective opinion on any given task.
  • For each research question, trigger a crawler agent that scrapes online resources for information relevant to the given task.
  • For each scraped resources, summarize based on relevant information and keep track of its sources.
  • Finally, filter and aggregate all summarized sources and generate a final research report.

Tutorials

Features

  • 📝 Generate research, outlines, resources and lessons reports with local documents and web sources
  • 📜 Can generate long and detailed research reports (over 2K words)
  • 🌐 Aggregates over 20 web sources per research to form objective and factual conclusions
  • 🖥️ Includes both lightweight (HTML/CSS/JS) and production ready (NextJS + Tailwind) UX/UI
  • 🔍 Scrapes web sources with javascript support
  • 📂 Keeps track and context and memory throughout the research process
  • 📄 Export research reports to PDF, Word and more...

📖 Documentation

Please see here for full documentation on:

  • Getting started (installation, setting up the environment, simple examples)
  • Customization and configuration
  • How-To examples (demos, integrations, docker support)
  • Reference (full API docs)

⚙️ Getting Started

Installation

Step 0 - Install Python 3.11 or later. See here for a step-by-step guide.

Step 1 - Download the project and navigate to its directory

git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher

Step 3 - Set up API keys using two methods: exporting them directly or storing them in a .env file.

For Linux/Windows temporary setup, use the export method:

export OPENAI_API_KEY={Your OpenAI API Key here}
export TAVILY_API_KEY={Your Tavily API Key here}

For a more permanent setup, create a .env file in the current gpt-researcher directory and input the env vars (without export).

  • The default LLM is GPT, but you can use other LLMs such as claude, ollama3, gemini, mistral and more. To learn how to change the LLM provider, see the LLMs documentation page. Please note: this project is optimized for OpenAI GPT models.
  • The default retriever is Tavily, but you can refer to other retrievers such as duckduckgo, google, bing, searchapi, serper, searx, arxiv, exa and more. To learn how to change the search provider, see the retrievers documentation page.

Quickstart

Step 1 - Install dependencies

pip install -r requirements.txt

Step 2 - Run the agent with FastAPI

python -m uvicorn main:app --reload

Step 3 - Go to http://localhost:8000 on any browser and enjoy researching!


To learn how to get started with Poetry or a virtual environment check out the documentation page.

Run as PIP package

pip install gpt-researcher
...
from gpt_researcher import GPTResearcher

query = "why is Nvidia stock going up?"
researcher = GPTResearcher(query=query, report_type="research_report")
# Conduct research on the given query
research_result = await researcher.conduct_research()
# Write the report
report = await researcher.write_report()
...

For more examples and configurations, please refer to the PIP documentation page.

Run with Docker

Step 1 - Install Docker

Step 2 - Clone the '.env.example' file, add your API Keys to the cloned file and save the file as '.env'

Step 3 - Within the docker-compose file comment out services that you don't want to run with Docker.

docker-compose up --build

If that doesn't work, try running it without the dash:

docker compose up --build

Step 4 - By default, if you haven't uncommented anything in your docker-compose file, this flow will start 2 processes:

  • the Python server running on localhost:8000
  • the React app running on localhost:3000

Visit localhost:3000 on any browser and enjoy researching!

📄 Research on Local Documents

You can instruct the GPT Researcher to run research tasks based on your local documents. Currently supported file formats are: PDF, plain text, CSV, Excel, Markdown, PowerPoint, and Word documents.

Step 1: Add the env variable DOC_PATH pointing to the folder where your documents are located.

export DOC_PATH="./my-docs"

Step 2:

  • If you're running the frontend app on localhost:8000, simply select "My Documents" from the "Report Source" Dropdown Options.
  • If you're running GPT Researcher with the PIP package, pass the report_source argument as "local" when you instantiate the GPTResearcher class code sample here.

👪 Multi-Agent Assistant

As AI evolves from prompt engineering and RAG to multi-agent systems, we're excited to introduce our new multi-agent assistant built with LangGraph.

By using LangGraph, the research process can be significantly improved in depth and quality by leveraging multiple agents with specialized skills. Inspired by the recent STORM paper, this project showcases how a team of AI agents can work together to conduct research on a given topic, from planning to publication.

An average run generates a 5-6 page research report in multiple formats such as PDF, Docx and Markdown.

Check it out here or head over to our documentation for more information.

🖥️ Frontend Applications

GPT-Researcher now features an enhanced frontend to improve the user experience and streamline the research process. The frontend offers:

  • An intuitive interface for inputting research queries
  • Real-time progress tracking of research tasks
  • Interactive display of research findings
  • Customizable settings for tailored research experiences

Two deployment options are available:

  1. A lightweight static frontend served by FastAPI
  2. A feature-rich NextJS application for advanced functionality

For detailed setup instructions and more information about the frontend features, please visit our documentation page.

🚀 Contributing

We highly welcome contributions! Please check out contributing if you're interested.

Please check out our roadmap page and reach out to us via our Discord community if you're interested in joining our mission.

✉️ Support / Contact us

🛡 Disclaimer

This project, GPT Researcher, is an experimental application and is provided "as-is" without any warranty, express or implied. We are sharing codes for academic purposes under the Apache 2 license. Nothing herein is academic advice, and NOT a recommendation to use in academic or research papers.

Our view on unbiased research claims:

  1. The main goal of GPT Researcher is to reduce incorrect and biased facts. How? We assume that the more sites we scrape the less chances of incorrect data. By scraping multiple sites per research, and choosing the most frequent information, the chances that they are all wrong is extremely low.
  2. We do not aim to eliminate biases; we aim to reduce it as much as possible. We are here as a community to figure out the most effective human/llm interactions.
  3. In research, people also tend towards biases as most have already opinions on the topics they research about. This tool scrapes many opinions and will evenly explain diverse views that a biased person would never have read.

Star History Chart

最近版本更新:(数据更新于 2024-09-21 19:29:40)

2024-09-15 13:22:02 v3.0.8

2024-09-08 02:04:47 v3.0.7

2024-08-26 16:05:27 v3.0.6

2024-08-18 14:11:13 v.3.0.5

2024-08-12 20:38:17 v3.0.4

2024-08-04 15:47:28 v3.0.3

2024-07-29 13:42:41 v3.0.2

2024-07-21 15:50:51 v.0.3.1

2024-07-15 18:22:43 v0.3.0

2024-07-06 21:54:54 v0.2.8

主题(topics):

agent, ai, automation, gpt-4, hacktoberfest, openai, python, research, search, webscraping

assafelovic/gpt-researcher同语言 Python最近更新仓库

2024-11-05 16:16:26 Guovin/TV

2024-11-05 15:03:24 Cinnamon/kotaemon

2024-11-05 11:00:51 home-assistant/core

2024-11-04 23:11:11 DS4SD/docling

2024-11-04 10:56:18 open-compass/opencompass

2024-11-04 08:51:21 yt-dlp/yt-dlp