Fork: 291 Star: 4202 (更新于 2023-06-03 13:11:30)

license: MIT

Language: Python.

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

最后发布版本: 0.1.3 (2023-05-26 09:32:25)

官方网址 GitHub网址

Serge - LLaMA made easy 🦙

License Discord

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

  • SvelteKit frontend
  • Redis for storing chat history & parameters
  • FastAPI + langchain for the API, wrapping calls to llama.cpp using the python bindings


Getting started

Setting up Serge is very easy. Starting it up can be done in a single command:

docker run -d -v weights:/usr/src/app/weights -v datadb:/data/db/ -p 8008:8008

Then just go to http://localhost:8008/ and you're good to go!

The API documentation can be found at http://localhost:8008/api/docs


Make sure you have docker desktop installed, WSL2 configured and enough free RAM to run models. (see below)

Kubernetes & docker compose

Setting up Serge on Kubernetes or docker compose can be found in the wiki:


Currently the following models are supported:

  • GPT4-Alpaca-LoRA-30B
  • Alpaca-LoRA-65B
  • OpenAssistant-30B
  • GPT4All-13B
  • Stable-Vicuna-13B
  • Guanaco-7B
  • Guanaco-13B
  • Guanaco-33B
  • Guanaco-65B

If you have existing weights from another project you can add them to the serge_weights volume using docker cp.

:warning: A note on memory usage

LLaMA will just crash if you don't have enough available memory for your model.

  • 7B requires about 4.5GB of free RAM
  • 13B requires about 12GB free
  • 30B requires about 20GB free


Feel free to join the discord if you need help with the setup:


Serge is always open for contributions! If you catch a bug or have a feature idea, feel free to open an issue or a PR.

If you want to run Serge in development mode (with hot-module reloading for svelte & autoreload for FastAPI) you can do so like this:

git clone
DOCKER_BUILDKIT=1 docker compose -f up -d --build

You can test the production image with

DOCKER_BUILDKIT=1 docker compose up -d --build

What's next

  • Front-end to interface with the API
  • Pass model parameters when creating a chat
  • Manager for model files
  • Support for other models
  • LangChain integration
  • User profiles & authentication

And a lot more!

最近版本更新:(数据更新于2023-06-04 01:46:20)

2023-05-26 09:32:25 0.1.3

2023-04-29 20:23:32 0.1.2

2023-04-26 14:10:02 0.1.1

2023-04-26 03:13:26 0.1.0

2023-04-17 23:13:03 0.0.5

2023-04-03 02:41:07 0.0.4

2023-03-29 14:29:56 0.0.3

2023-03-26 23:18:58 0.0.2b

2023-03-26 23:00:01 0.0.2

2023-03-25 22:39:25 0.0.1


alpaca, docker, fastapi, llama, llamacpp, nginx, python, svelte, sveltekit, tailwindcss, web


2023-06-05 23:03:20 zauberzeug/nicegui

2023-06-05 11:05:30 hwchase17/langchain

2023-06-05 07:01:44 bridgecrewio/checkov

2023-06-04 20:21:14 arc53/DocsGPT

2023-06-04 19:52:26 iryna-kondr/scikit-llm

2023-06-04 18:38:18 acheong08/ChatGPT