Safetensors stores tensors in a simple binary format with a metadata header describing tensor names, data types, shapes, and offsets within the file. This allows the library to read file contents without executing any code embedded in the file. The format is designed to support fast data access and enables zero-copy or partial memory-mapping scenarios, depending on the framework and environment in use. Implementations exist for major ecosystems including PyTorch, TensorFlow, JAX, PaddlePaddle, and NumPy.
Safetensors addresses the unsafe deserialization of model weights in formats such as pickle and the associated security risks. Traditional checkpoint formats can execute arbitrary code during loading, which poses a threat when downloading models from external sources. Additionally, many legacy formats were not designed for fast, simple, and predictable large-scale tensor access. Safetensors mitigates these risks by providing a secure, straightforward, and efficient format for storing tensors.
First 8 bytes of the safetensors file. Stores the JSON header size as a 64-bit unsigned integer (uint64) in little-endian byte order. Enables immediate location of the JSON header without parsing tensor data.
Variable-length UTF-8 JSON section immediately following the header size field. Contains a dictionary mapping tensor names to their dtype (e.g., F16, BF16, F32), shape (array of dimension integers), and data_offsets ([BEGIN, END] relative to the start of the data region). Optional __metadata__ key stores arbitrary string-to-string pairs. Size bounded to 100 MB by MAX_HEADER_SIZE.
Contiguous block of raw bytes storing all tensor data in C (row-major) order, without compression or padding between tensors. Offsets from the JSON header are relative to the start of this buffer (not the file start). Tensors must be packed before serialization โ striding is not supported.
PyTorch allows tensors sharing the same memory storage. The safetensors PyTorch adapter includes special logic for detecting and handling shared tensors. Serializing models with shared tensors without this handling may lead to data duplication or errors. After deserialization, memory sharing is lost โ each tensor is independent.
Safetensors does not use compression. Tensor data is stored as raw bytes. For models with low entropy (e.g., highly sparse weights or quantized models with many zeros), file size may be significantly larger than with compressed serialization formats.
JSON specification does not formally define behavior for duplicate keys. The Trail of Bits audit found that the Hugging Face reference implementation rejects files with duplicate keys, but some third-party JSON parsers accept them with undefined behavior. A malicious file may thus behave differently across implementations.
The safetensors format does not include a built-in data integrity mechanism (e.g., SHA-256 hash of tensors). File corruption during transmission or storage may not be detected at load time โ the format validates structure and offsets but not data checksums.
Nicolas Patry at Hugging Face published the first version of the safetensors library and format specification. Rust core, Python bindings via PyO3, PyTorch and NumPy support. Format designed as a secure and fast alternative to pickle.
Independent security audit by Trail of Bits, commissioned by Hugging Face, EleutherAI, and Stability AI. No critical vulnerabilities found. Hugging Face Hub adopted safetensors as preferred format, displaying warnings for pickle-format models.
PyTorch merged native safetensors support into its core serialization API (weights_only parameter and safetensors format option in the save API). This marks institutional endorsement by the leading deep learning framework, relegating pickle to legacy status.
Time complexity: O(n). Space complexity: O(n).
Tensors in the data buffer are independent โ they can be loaded in parallel by multiple threads or processes. JSON header parsing is sequential but takes negligible time compared to data loading. The format supports distributed loading: each node can load a different subset of tensors (tensor parallelism sharding, used in TGI).
Safetensors is a file format, not a computational algorithm. The loading operation (JSON header parsing, memory-mapping) is hardware-agnostic and efficient both on CPU and as a preliminary stage before data transfer to GPU/TPU. The format supports framework-agnostic loading of the same tensors into PyTorch, TensorFlow, JAX, and MLX.