Inference & Serving

NIM

NVIDIA's prebuilt, containerized microservices for deploying AI models anywhere.

Definition

NVIDIA NIM (NVIDIA Inference Microservices) is a set of ready-to-run, pre-tuned packages that bundle a model together with the software that runs it, a standard connection interface, and everything else it needs into one self-contained unit. The goal is to make putting generative AI into use as simple as calling a hosted service, while keeping the model running on the owner's own NVIDIA hardware (cloud, data center, workstation, or RTX PC). Under the hood NIM is built on engines like Triton Inference Server, TensorRT, TensorRT-LLM, and vLLM/SGLang, and it accepts requests in the widely used OpenAI format.

NIM

Definition

Related terms