NVIDIA's GPU parallel programming platform (since 2007) — the foundation of all modern AI/ML. Includes a toolkit (nvcc compiler), runtime API, accelerated libraries (cuBLAS, cuDNN, NCCL, CUTLASS), and dozens of domain SDKs.

CUDA (Compute Unified Device Architecture) is a parallel GPU programming platform and model created by NVIDIA and released in June 2007 together with the Tesla architecture (GeForce 8). Originally a general-purpose GPGPU stack, in the last decade it has become the fundamental execution layer of all modern AI: every modern LLM, diffusion model, ML framework (PyTorch, TensorFlow, JAX), and robotics simulator (Isaac Sim, Omniverse) runs on CUDA. The latest stable release is CUDA 13.0 (September 2025).
The CUDA stack consists of: (1) Driver API and Runtime API (C/C++) — a low-level GPU interface, (2) the nvcc compiler and CUDA C/C++ language (a C++ extension with `__global__`, `__device__`, kernels, and grid/block hierarchy), (3) accelerated libraries: cuBLAS (BLAS), cuDNN (deep learning primitives), cuFFT, cuRAND, cuSPARSE, cuSOLVER, NCCL (multi-GPU collective comms), CUTLASS (template-based linear algebra), Thrust (parallel STL), (4) higher layers: TensorRT (inference engine), Triton Inference Server, NVIDIA NeMo, Isaac, Omniverse, RAPIDS, Modulus.
Hardware: CUDA runs exclusively on NVIDIA GPUs (from G80/Tesla through Hopper, Blackwell, Rubin), across the full spectrum — from consumer RTX to data-center H100/H200/B200, embedded Jetson and the Grace Hopper superchip. CUDA is closed-source (Driver and most libraries), but parts of key elements (CUTLASS, cuDNN samples, OpenCL/cuBLAS headers) are open. Natively supported languages: C/C++, Fortran, official bindings for Python (CUDA Python, CuPy), Julia (CUDA.jl), Rust (cust). CUDA is the de-facto AI acceleration standard — alternatives (AMD ROCm, Intel OneAPI, Apple Metal) exist, but CUDA's ecosystem is the largest.
Pricing models
Resource quotas
SLA & Support