Serverless inference API
A single endpoint for open-source large language models, billed per token. Send a request, get a completion — the platform loads, batches, and scales the model behind the API, so there is nothing to provision and nothing to keep warm.
- Pay-per-token access to a catalog of open-source LLMs
- No servers to stand up, autoscale, or keep running
- A plain HTTP API that drops into code you already have
- Ideal for prototypes, language features, and bursty traffic