Skip to main content
All terms
Inference & Serving

OpenAI Compatible API

An endpoint that mirrors OpenAI's request and response format so existing clients work unchanged.

Definition

An OpenAI-compatible API exposes the same request and response formats as OpenAI's chat completions and completions endpoints. This lets existing client code, SDKs, and tools work with alternative inference servers such as vLLM, SGLang, Ollama, and TGI with little or no modification. It has become the de facto interface for self-hosted and third-party model serving, reducing the cost of switching backends.