janhq / nitro
- понедельник, 8 января 2024 г. в 00:02:23
A fast, lightweight, embeddable inference engine to supercharge your apps with local AI. OpenAI-compatible API
Documentation - API Reference - Changelog - Bug reports - Discord
⚠️ Nitro is currently in Development: Expect breaking changes and bugs!
Nitro is a high-efficiency C++ inference engine for edge computing, powering Jan. It is lightweight and embeddable, ideal for product integration.
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.
Read more about Nitro at https://nitro.jan.ai/
.
├── controllers
├── docs
├── llama.cpp -> Upstream llama C++
├── nitro_deps -> Dependencies of the Nitro project as a sub-project
└── utils
Step 1: Install Nitro
For Linux and MacOS
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
For Windows
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
Step 2: Downloading a Model
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
Step 3: Run Nitro server
nitro
Step 4: Load model
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
"llama_model_path": "/model/llama-2-7b-model.gguf",
"ctx_len": 512,
"ngl": 100,
}'
Step 5: Making an Inference
curl http://localhost:3928/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Who won the world series in 2020?"
},
]
}'
Table of parameters
Parameter | Type | Description |
---|---|---|
llama_model_path |
String | The file path to the LLaMA model. |
ngl |
Integer | The number of GPU layers to use. |
ctx_len |
Integer | The context length for the model operations. |
embedding |
Boolean | Whether to use embedding in the model. |
n_parallel |
Integer | The number of parallel operations. |
cont_batching |
Boolean | Whether to use continuous batching. |
user_prompt |
String | The prompt to use for the user. |
ai_prompt |
String | The prompt to use for the AI assistant. |
system_prompt |
String | The prompt to use for system rules. |
pre_prompt |
String | The prompt to use for internal configuration. |
cpu_threads |
Integer | The number of threads to use for inferencing (CPU MODE ONLY) |
n_batch |
Integer | The batch size for prompt eval step |
caching_enabled |
Boolean | To enable prompt caching or not |
clean_cache_threshold |
Integer | Number of chats that will trigger clean cache action |
OPTIONAL: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port])
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
To compile nitro please visit Compile from source
Version Type | Windows | MacOS | Linux | |||
Stable (Recommended) |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Experimental (Nighlty Build) | Github action artifactory |
Download the latest version of Nitro at https://nitro.jan.ai/ or visit the GitHub Releases to download any previous release.
Nightly build is a process where the software is built automatically every night. This helps in detecting and fixing bugs early in the development cycle. The process for this project is defined in .github/workflows/build.yml
You can join our Discord server here and go to channel github-nitro to monitor the build process.
The nightly build is triggered at 2:00 AM UTC every day.
The nightly build can be downloaded from the url notified in the Discord channel. Please access the url from the browser and download the build artifacts from there.
Manual build is a process where the software is built manually by the developers. This is usually done when a new feature is implemented or a bug is fixed. The process for this project is defined in .github/workflows/build.yml
It is similar to the nightly build process, except that it is triggered manually by the developers.