cheahjs / free-llm-api-resources
- среда, 11 февраля 2026 г. в 00:00:03
A list of free LLM inference resources accessible via API.
This lists various services that provide free access or credits towards API-based LLM usage.
Note
Please don't abuse these services, else we might lose them.
Warning
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Limits:
20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup
Models share a common quota.
Data is used for training when used outside of the UK/CH/EEA/EU.
| Model Name | Model Limits |
|---|---|
| Gemini 3 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 2.5 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 2.5 Flash-Lite | 250,000 tokens/minute 20 requests/day 10 requests/minute |
| Gemma 3 27B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 12B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 4B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 1B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
Phone number verification required. Models tend to be context window limited.
Limits: 40 requests/minute
Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
Limits: 30 requests/minute, 2,000 requests/day
HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.
Limits: $0.10/month in credits
Routes to various supported providers.
Limits: $5/month
| Model Name | Model Limits |
|---|---|
| gpt-oss-120b | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 235B A22B Instruct | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.3 70B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 32B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Z.ai GLM-4.6 | 10 requests/minute 60,000 tokens/minute 100 requests/hour 100,000 tokens/hour 100 requests/day 1,000,000 tokens/day |
| Model Name | Model Limits |
|---|---|
| Allam 2 7B | 7,000 requests/day 6,000 tokens/minute |
| Llama 3.1 8B | 14,400 requests/day 6,000 tokens/minute |
| Llama 3.3 70B | 1,000 requests/day 12,000 tokens/minute |
| Llama 4 Maverick 17B 128E Instruct | 1,000 requests/day 6,000 tokens/minute |
| Llama 4 Scout Instruct | 1,000 requests/day 30,000 tokens/minute |
| Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day |
| Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day |
| canopylabs/orpheus-arabic-saudi | |
| canopylabs/orpheus-v1-english | |
| groq/compound | 250 requests/day 70,000 tokens/minute |
| groq/compound-mini | 250 requests/day 70,000 tokens/minute |
| meta-llama/llama-guard-4-12b | 14,400 requests/day 15,000 tokens/minute |
| meta-llama/llama-prompt-guard-2-22m | |
| meta-llama/llama-prompt-guard-2-86m | |
| moonshotai/kimi-k2-instruct | 1,000 requests/day 10,000 tokens/minute |
| moonshotai/kimi-k2-instruct-0905 | 1,000 requests/day 10,000 tokens/minute |
| openai/gpt-oss-120b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-20b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-safeguard-20b | 1,000 requests/day 8,000 tokens/minute |
| qwen/qwen3-32b | 1,000 requests/day 6,000 tokens/minute |
Limits:
20 requests/minute
1,000 requests/month
Models share a common monthly quota.
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
Limits: 10,000 neurons/day
Very stringent payment verification for Google Cloud.
| Model Name | Model Limits |
|---|---|
| Llama 3.2 90B Vision Instruct | 30 requests/minute Free during preview |
| Llama 3.1 70B Instruct | 60 requests/minute Free during preview |
| Llama 3.1 8B Instruct | 60 requests/minute Free during preview |
Credits: $1
Models: Various open models
Credits: $30
Models: Any supported model - pay by compute time
Credits: $1
Models: Various open models
Credits: $0.5 for 1 year
Models: Various open models
Credits: $10 for 3 months
Models: Jamba family of models
Credits: $10 for 3 months
Models: Solar Pro/Mini
Credits: $15
Requirements: Phone number verification
Models: Various open models
Credits: 1 million tokens/model
Models: Various open and proprietary Qwen models
Credits: $5/month upon sign up, $30/month with payment method added
Models: Any supported model - pay by compute time
Credits: $1, $25 on responding to email survey
Models: Various open models
Credits: $1
Models:
Credits: $5 for 3 months
Models:
Credits: 1,000,000 free tokens
Models: