Robots Atlas>ROBOTS ATLAS
Mixtral 8x7B

Mixtral 8x7B

8x7B v0.1ย ยทย Family: Mistral
Open-weights sparse mixture-of-experts model from Mistral AI: 46.7B total parameters (12.9B active per token), 32K context window, Apache 2.0 license.
โš  Deprecatedโœ“ Public accessโš– Open sourceLLM๐Ÿ“ Mistral
Context window
32K
tokens
Parameters
46.7B total / 12.9B active
parameters
Release date
11 December 2023
Access:APIDownloadDeployment:๐Ÿ’ป Localโ˜ Cloud

Overview

Mixtral 8x7B is a decoder-only Sparse Mixture-of-Experts (SMoE) language model released by Mistral AI on December 11, 2023 under the Apache 2.0 license. At every layer and for every token, a router network selects 2 of 8 expert groups in the feed-forward block and combines their outputs additively. This yields 46.7B total parameters while only ~12.9B are active per token, keeping inference cost and latency comparable to a 12.9B model.

The model supports a 32k token context window and five languages: English, French, Italian, German and Spanish. Its Instruct variant, fine-tuned with SFT and DPO, scores 8.30 on MT-Bench. Mixtral 8x7B was distributed both as downloadable weights and via the Mistral API as open-mixtral-8x7b. It was marked deprecated on November 30, 2024 and retired from the Mistral API on March 30, 2025.

Classification
LLM
Family: Mistral
Access & deployment
APIDownload
LocalCloud
Weights: Open source
Key parameters
๐Ÿ“ Context: 32K
๐Ÿงฉ Parameters: 46.7B total / 12.9B active
โœ“ Fine-tuning
๐Ÿ“ฅ Input: text

Technical specification

Context window
32K
tokens
Parameters
46.7B total / 12.9B active
parameters
License
Apache 2.0
Features:โœ“ Fine-tuning
Modalities
โฌ‡ Input
text
โฌ† Output
textcode

Capabilities and applications

Native model capabilities
Language modeling
Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Multilingual
Understanding and generating text in many languages.
Category: language
Long context
Maintaining coherence and focus across very long input context.
Category: language
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning

Benchmark results

2 benchmarks
MT-Bench
8.30
๐Ÿ“„ mistral.ai/news/mixtral-of-experts
Score for Mixtral 8x7B Instruct (SFT + DPO).
MMLU
accuracy
70.6%%
๐Ÿ“„ mistral.ai/news/mixtral-of-experts

Technical architecture

Core Architecture
Model Form