v5.0.0
版本发布时间: 2024-09-12 00:25:43
Future-House/paper-qa最新发布版本:v5.0.3(2024-09-14 11:48:33)
New Features
- Automatic population of metadata: PDF metadata is automatically retrieved from a variety of providers, including adding bibtex, citation counts, journal quality assessments, and noting retractions
- full-text search: A major difference between our published work and this repo is ability to search over all of scientific literature. We've brought the OSS version closer by adding full-text keyword search via tantivy. Now you can index and search many papers before embdding, making it feasible to ingest many papers.
- unified settings management: You can now save/load settings and that makes it easier for us to distribute settings reflecting various tasks with PaperQA2. Examples are writing wikipedia articles, identifying contradictions, and obtaining structured data
- CLI: We've made a CLI that uses persistent parsings/indexes and makes it much easier to just ask questions of a folder of PDFs
- Litellm: We've adopted litellm as the LLM wrapper of choice. This means we now support many LLM APIs directly with only the model string changing. It also means we have "routers" now that can do fallbacks, api rate limiting, and retries.
Improvements
- More modern agent frameworks
- Reduction in dependencies
- Removed code duplicated by litellm
- Many improvements on code style and best practices
Regressions/Deprecation
We've removed the following features to keep our library focused:
-
doc_match
- we do not have enough data to support that this method actually helps for very large corpuses - LangchainVectorStore - We no longer support more complex vector stores via Langchain like FAISS. Instead, we only support Numpy vector stores. We never found the paradigm of very large vector stores to be better than keyword search -> vector search -> LLM reranking and thus removed the code
Detailed Changes:
- typo by @oganm in https://github.com/Future-House/paper-qa/pull/303
- Updated readme and models by @mskarlin in https://github.com/Future-House/paper-qa/pull/305
- Add Client (external API) Module For Enhanced Metadata by @mskarlin in https://github.com/Future-House/paper-qa/pull/306
- Agentic workflows, locally indexed search, and CLI by @mskarlin in https://github.com/Future-House/paper-qa/pull/309
- Add new unpaywall provider by @mskarlin in https://github.com/Future-House/paper-qa/pull/310
- Rollback search fields to
list
and dynamically compute md5 hash in tests by @mskarlin in https://github.com/Future-House/paper-qa/pull/311 - Refactor to breakout config from rest of code by @whitead in https://github.com/Future-House/paper-qa/pull/289
- Changed to rely on litellm for computing cost by @whitead in https://github.com/Future-House/paper-qa/pull/321
- Fixing
LLMModel.axyz_iter
type hints by @jamesbraza in https://github.com/Future-House/paper-qa/pull/324 - CLI Fixes by @whitead in https://github.com/Future-House/paper-qa/pull/322
-
black
ened code to prevent IDE scrolling by @jamesbraza in https://github.com/Future-House/paper-qa/pull/330 - Optimized import paths by @jamesbraza in https://github.com/Future-House/paper-qa/pull/331
- Removed
pytest-mock
plugin by @jamesbraza in https://github.com/Future-House/paper-qa/pull/328 - Adding
pytest-xdist
plugin by @jamesbraza in https://github.com/Future-House/paper-qa/pull/329 - Passing
mypy
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/332 - Removing
make_chain
in favor ofrun_prompt
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/325 - Readme updates by @mskarlin in https://github.com/Future-House/paper-qa/pull/323
- Adding
refurb
tool, andlint
CI by @jamesbraza in https://github.com/Future-House/paper-qa/pull/333 - Fixing arg ordering after #325 by @jamesbraza in https://github.com/Future-House/paper-qa/pull/334
- Fixing
parse_text
after #332 by @jamesbraza in https://github.com/Future-House/paper-qa/pull/335 - Fixing union attr error by @jamesbraza in https://github.com/Future-House/paper-qa/pull/338
- Check if a journal name starts with
the
by @geemi725 in https://github.com/Future-House/paper-qa/pull/320 - Fixing two more tests by @jamesbraza in https://github.com/Future-House/paper-qa/pull/340
- All Ruff
ANN
autofixes by @jamesbraza in https://github.com/Future-House/paper-qa/pull/341 - Adding in
.mailmap
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/342 - Remove cassettes which aren't needed by @mskarlin in https://github.com/Future-House/paper-qa/pull/339
- Add configs for contracrow + wikicrow by @mskarlin in https://github.com/Future-House/paper-qa/pull/336
- Removed
LangchainVectorStore
,llms
extra, and fixing upREADME
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/343 - Dropping
requests
dependency by @jamesbraza in https://github.com/Future-House/paper-qa/pull/346 - Removed
html2text
requirement by @jamesbraza in https://github.com/Future-House/paper-qa/pull/347 - Requiring Python 3.11+ by @jamesbraza in https://github.com/Future-House/paper-qa/pull/348
- Did one revision at README by @whitead in https://github.com/Future-House/paper-qa/pull/344
- Renaming fitz to pymupdf by @mskarlin in https://github.com/Future-House/paper-qa/pull/350
- Better control flow in
litellm_get_search_query
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/351 - Recurse into directories; catch empty documents by @sidnarayanan in https://github.com/Future-House/paper-qa/pull/352
- Move configure_cli_logging such that it's not called twice by @mskarlin in https://github.com/Future-House/paper-qa/pull/353
- Cleaning up dependencies by @jamesbraza in https://github.com/Future-House/paper-qa/pull/354
- Fixed code in README by @whitead in https://github.com/Future-House/paper-qa/pull/355
- Added citation and paper URL by @whitead in https://github.com/Future-House/paper-qa/pull/357
-
aviary
andldp
for agents overlangchain
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/358 - Adds retraction status by @geemi725 in https://github.com/Future-House/paper-qa/pull/314
- Adding
pylint
by @jamesbraza in https://github.com/Future-House/paper-qa/pull/349 - Added account for cost info by @whitead in https://github.com/Future-House/paper-qa/pull/360
New Contributors
- @oganm made their first contribution in https://github.com/Future-House/paper-qa/pull/303
- @geemi725 made their first contribution in https://github.com/Future-House/paper-qa/pull/320
- @sidnarayanan made their first contribution in https://github.com/Future-House/paper-qa/pull/352
Full Changelog: https://github.com/Future-House/paper-qa/compare/v4.9.0...vnew